This RFC is aimed at introducing the buffer sharing framework for review.
Since almost all the discussion about buffer sharing objects happened on linaro-mm-sig list, I am sending it first for review within the list, and will share it with other lists after first-set of review comments from here.
--
This is the first step in defining a buffer sharing framework. A new buffer object is added, with hooks to allow for easy sharing of this buffer object across devices.
The idea was first mooted at the Linaro memory management mini-summit in Budapest in May 2011, as part of multiple things needed for a 'unified memory management framework'. It took a more concrete shape at Linaro memory-management mini-summit in Cambridge, Aug 2011.
The framework allows: - a new buffer-object to be created, which associates a file pointer with each user-buffer and associated allocator-defined operations on that buffer. This operation is called the 'export' operation. - this exported buffer-object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across. - a received fd to get the buffer object back, where it can be accessed using the associated allocator-defined operations. - the exporter and importer to share the scatterlist using get_ and put_ operations.
Some file operations are provided as wrappers over the allocator-defined operations, which allows usage of fops(eg mmap) on the associated 'fd'.
This is based on design suggestions from many people at both the mini-summits, most notably from Arnd Bergmann arnd@arndb.de, and Rob Clark rob@ti.com.
The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski t.stanislaws@samsung.com, who demonstrated buffer sharing between two v4l2 devices.
Signed-off-by: Sumit Semwal sumit.semwal@ti.com --- drivers/base/Kconfig | 10 +++ drivers/base/Makefile | 1 + drivers/base/dma-buf.c | 196 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 105 +++++++++++++++++++++++++ 4 files changed, 312 insertions(+), 0 deletions(-) create mode 100644 drivers/base/dma-buf.c create mode 100644 include/linux/dma-buf.h
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index d57e8d0..5398ce8 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -168,4 +168,14 @@ config SYS_HYPERVISOR bool default n
+config DMA_SHARED_BUFFER + bool "Buffer framework to be shared between drivers" + default n + depends on ANON_INODES + help + This option enables the framework for buffer-sharing between + multiple drivers. A buffer is associated with a file using driver + APIs extension; the file's descriptor can then be passed on to other + driver. + endmenu diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 4c5701c..bd95732 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER) += firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c new file mode 100644 index 0000000..e35b385 --- /dev/null +++ b/drivers/base/dma-buf.c @@ -0,0 +1,196 @@ +/* + * Framework for buffer objects that can be shared across devices/subsystems. + * + * Copyright(C) 2011 Texas Instruments Inc. All rights reserved. + * Author: Sumit Semwal sumit.semwal@ti.com + * + * Many thanks to linaro-mm-sig list, and specially + * Arnd Bergmann arnd@arndb.de and Rob Clark rob@ti.com + * for their support in creation and refining of this idea. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/fs.h> +#include <linux/slab.h> +#include <linux/dma-buf.h> +#include <linux/anon_inodes.h> + +static int is_dma_buf_file(struct file *); + +/* file operation wrappers for dma buf ops */ +static ssize_t dma_buf_read(struct file *file, char __user *buf, size_t size, + loff_t *offset) +{ + struct dma_buf *dmabuf; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + dmabuf = file->private_data; + return dmabuf->ops->read(dmabuf, buf, size); +} + +static ssize_t dma_buf_write(struct file *file, char __user *buf, size_t size, + loff_t *offset) +{ + struct dma_buf *dmabuf; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + dmabuf = file->private_data; + return dmabuf->ops->write(dmabuf, buf, size); +} + +static int dma_buf_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct dma_buf *dmabuf; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + dmabuf = file->private_data; + return dmabuf->ops->mmap(dmabuf, vma); +} + +static int dma_buf_release(struct inode *inode, struct file *file) +{ + struct dma_buf *dmabuf; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + dmabuf = file->private_data; + + dmabuf->ops->release(dmabuf); + kfree(dmabuf); + return 0; +} + +static const struct file_operations dma_buf_fops = { + .mmap = dma_buf_mmap, + .read = dma_buf_read, + .write = dma_buf_write, + .release = dma_buf_release, +}; + +/* + * is_dma_buf_file - Check if struct file* is associated with dma_buf + */ +static inline int is_dma_buf_file(struct file *file) +{ + return file->f_op == &dma_buf_fops; +} + +/** + * dma_buf_export - Creates a new dma_buf, and associates an anon file + * with this buffer,so it can be exported. + * Also 'attach' the allocator specific data and ops to the buffer. + * + * @priv: Attach private data of allocator to this buffer + * @ops: Attach allocator-defined dma buf ops to the new buffer. + * + * Returns, on success, a newly created dma_buf object, which wraps the + * supplied private data and operations for dma_buf_ops. On failure to + * allocate the dma_buf object, it can return NULL. + * + */ +struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops) +{ + struct dma_buf *dmabuf; + struct file *file; + + BUG_ON(!priv || !ops); + + dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL); + if (dmabuf == NULL) + return dmabuf; + + dmabuf->priv = priv; + dmabuf->ops = ops; + + file = anon_inode_getfile("dmabuf", &dma_buf_fops, dmabuf, 0); + + dmabuf->file = file; + file->private_data = dmabuf; + + return dmabuf; +} +EXPORT_SYMBOL(dma_buf_export); + +/** + * dma_buf_fd - returns a file descriptor for the given dma_buf + * @dmabuf: [in] pointer to dma_buf for which fd is required. + * + * On success, returns an associated 'fd'. Else, returns error. + */ +int dma_buf_fd(struct dma_buf *dmabuf) +{ + int error, fd; + + if (!dmabuf->file) + return -EINVAL; + + error = get_unused_fd_flags(0); + if (error < 0) + return error; + fd = error; + + fd_install(fd, dmabuf->file); + + return fd; +} +EXPORT_SYMBOL(dma_buf_fd); + +/** + * dma_buf_get - returns the dma_buf structure related to an fd + * @fd: [in] fd associated with the dma_buf to be returned + * + * On success, returns the dma_buf structure associated with an fd; uses + * file's refcounting done by fget to increase refcount. returns ERR_PTR + * otherwise. + */ +struct dma_buf *dma_buf_get(int fd) +{ + struct file *file; + + file = fget(fd); + + if (!file) + return ERR_PTR(-EBADF); + + if (!is_dma_buf_file(file)) { + fput(file); + return ERR_PTR(-EINVAL); + } + + return file->private_data; +} +EXPORT_SYMBOL(dma_buf_get); + +/** + * dma_buf_put - decreases refcount of the buffer + * @dmabuf: [in] buffer to reduce refcount of + * + * Uses file's refcounting done implicitly by fput() + */ +void dma_buf_put(struct dma_buf *dmabuf) +{ + BUG_ON(!dmabuf->file); + + fput(dmabuf->file); + + return; +} +EXPORT_SYMBOL(dma_buf_put); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h new file mode 100644 index 0000000..c22896a --- /dev/null +++ b/include/linux/dma-buf.h @@ -0,0 +1,105 @@ +/* + * Header file for dma buffer sharing framework. + * + * Copyright(C) 2011 Texas Instruments Inc. All rights reserved. + * Author: Sumit Semwal sumit.semwal@ti.com + * + * Many thanks to linaro-mm-sig list, and specially + * Arnd Bergmann arnd@arndb.de and Rob Clark rob@ti.com + * for their support in creation and refining of this idea. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ +#ifndef __DMA_BUF_H__ +#define __DMA_BUF_H__ + +#include <linux/file.h> +#include <linux/err.h> +#include <linux/device.h> +#include <linux/scatterlist.h> + +struct dma_buf; + +/** + * struct dma_buf_ops - operations possible on struct dmabuf + * @get_scatterlist: returns list of scatter pages allocated, increases + * usecount of the buffer + * @put_scatterlist: decreases usecount of buffer, might deallocate scatter + * pages + * @mmap: map this buffer + * @read: read from this buffer + * @write: write to this buffer + * @release: release this buffer; to be called after the last dma_buf_put + * @sync_sg_for_cpu: sync the sg list for cpu + * @sync_sg_for_device: synch the sg list for device + */ +struct dma_buf_ops { + /* allow buffer to not be pinned when DMA is not happening */ + struct scatterlist * (*get_scatterlist)(struct dma_buf *); + void (*put_scatterlist)(struct dma_buf *, struct scatterlist *); + + /* allow allocator to mmap/read/write to take care of cache attrib */ + int (*mmap)(struct dma_buf *, struct vm_area_struct *); + ssize_t (*read)(struct dma_buf *, void *, size_t); + ssize_t (*write)(struct dma_buf *, void *, size_t); + /* after final dma_buf_put() */ + void (*release)(struct dma_buf *); + + /* allow allocator to take care of cache ops */ + void (*sync_sg_for_cpu) (struct dma_buf *, struct device *); + void (*sync_sg_for_device)(struct dma_buf *, struct device *); +}; + +/** + * struct dma_buf - shared buffer object + * @file: file pointer used for sharing buffers across, and for refcounting. + * @ops: dma_buf_ops associated with this buffer object + * @priv: user specific private data + */ +struct dma_buf { + struct file *file; + struct dma_buf_ops *ops; + void *priv; +}; + +#ifdef CONFIG_DMA_SHARED_BUFFER + +struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops); +int dma_buf_fd(struct dma_buf *dmabuf); +struct dma_buf *dma_buf_get(int fd); +void dma_buf_put(struct dma_buf *dmabuf); + +#else +static inline struct dma_buf *dma_buf_export(void *priv, + struct dma_buf_ops *ops) +{ + return ERR_PTR(-ENODEV); +} + +static inline int dma_buf_fd(struct dma_buf *dmabuf) +{ + return -ENODEV; +} + +static inline struct dma_buf *dma_buf_get(int fd) +{ + return ERR_PTR(-ENODEV); +} + +static inline void dma_buf_put(struct dma_buf *dmabuf) +{ + return; +} +#endif /* CONFIG_DMA_SHARED_BUFFER */ + +#endif /* __DMA_BUF_H__ */
On Wed, Aug 10, 2011 at 7:11 AM, Sumit Semwal sumit.semwal@ti.com wrote:
+/**
- struct dma_buf_ops - operations possible on struct dmabuf
- @get_scatterlist: returns list of scatter pages allocated, increases
- usecount of the buffer
- @put_scatterlist: decreases usecount of buffer, might deallocate scatter
- pages
- @mmap: map this buffer
- @read: read from this buffer
- @write: write to this buffer
- @release: release this buffer; to be called after the last dma_buf_put
- @sync_sg_for_cpu: sync the sg list for cpu
- @sync_sg_for_device: synch the sg list for device
- */
+struct dma_buf_ops {
- /* allow buffer to not be pinned when DMA is not happening */
- struct scatterlist * (*get_scatterlist)(struct dma_buf *);
- void (*put_scatterlist)(struct dma_buf *, struct scatterlist *);
One thought, {get,put}_scatterlist() gives the allocating driver half of what it needs to implement a sync object.. ie. if receiving driver does a "get" before DMA, and "put" on completion, this tells the allocating driver that the buffer is in use. If we added an 'op' to get_scatterlist which was READ &/| WRITE this would even tell the allocator the nature of the DMA which could later be used to implement a wait(op).
Another thought.. at the risk of re-opening a can of worms.. an 'attrib' flag/mask as a param to get_scatterlist() could allow the allocator to defer backing the buffer w/ real memory until the first get_scatterlist() call, for now attrib flag could be discontig, contig, and weird (as Hans suggested) and we can try to define more precisely "weird" later but for now can have platform specific meaning perhaps. This gives a sort of "poor mans negotiation" which might help on some platforms for now..
BR, -R
- /* allow allocator to mmap/read/write to take care of cache attrib */
- int (*mmap)(struct dma_buf *, struct vm_area_struct *);
- ssize_t (*read)(struct dma_buf *, void *, size_t);
- ssize_t (*write)(struct dma_buf *, void *, size_t);
- /* after final dma_buf_put() */
- void (*release)(struct dma_buf *);
- /* allow allocator to take care of cache ops */
- void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
- void (*sync_sg_for_device)(struct dma_buf *, struct device *);
+};
On Thu, Aug 11, 2011 at 05:02:21AM -0500, Clark, Rob wrote:
On Wed, Aug 10, 2011 at 7:11 AM, Sumit Semwal sumit.semwal@ti.com wrote:
+/**
- struct dma_buf_ops - operations possible on struct dmabuf
- @get_scatterlist: returns list of scatter pages allocated, increases
- usecount of the buffer
- @put_scatterlist: decreases usecount of buffer, might deallocate scatter
- pages
- @mmap: map this buffer
- @read: read from this buffer
- @write: write to this buffer
- @release: release this buffer; to be called after the last dma_buf_put
- @sync_sg_for_cpu: sync the sg list for cpu
- @sync_sg_for_device: synch the sg list for device
- */
+struct dma_buf_ops {
- /* allow buffer to not be pinned when DMA is not happening */
- struct scatterlist * (*get_scatterlist)(struct dma_buf *);
- void (*put_scatterlist)(struct dma_buf *, struct scatterlist *);
One thought, {get,put}_scatterlist() gives the allocating driver half of what it needs to implement a sync object.. ie. if receiving driver does a "get" before DMA, and "put" on completion, this tells the allocating driver that the buffer is in use. If we added an 'op' to get_scatterlist which was READ &/| WRITE this would even tell the allocator the nature of the DMA which could later be used to implement a wait(op).
Another thought.. at the risk of re-opening a can of worms.. an 'attrib' flag/mask as a param to get_scatterlist() could allow the allocator to defer backing the buffer w/ real memory until the first get_scatterlist() call, for now attrib flag could be discontig, contig, and weird (as Hans suggested) and we can try to define more precisely "weird" later but for now can have platform specific meaning perhaps. This gives a sort of "poor mans negotiation" which might help on some platforms for now..
You've probably discussed this all at the recent summit I've totally missed, but why not add a struct device * pointer argument to get_scatterlist? This way we could shovel all the platform specific complexity into platform specific code.
For devices that have different dma requirements depending upon function (i.e. contigous for scanout, iommu-mapped with large pages for overlay image) we could use a bunch of subdevices to map all the different requirements.
Another similar idea could also solve the backing storage negotiation problem: A dma_buf object is at first just an abstract handle (with a fixed size), then you can attach users with dma_buf_attach_device. Then, after all users have themselves attached, the first get_scatterlist call actually allocates the backing storage for the buffer. A subsequent attach_device would then return -EBUSY.
This way we don't have a special device that allocates a buffer but instead all the knowledge about dma buffer handling sits in the platform specific dma code. With that design get_scatterlist should imo then also return an already mapped sg list, i.e. what dma_map_sg would return. Probably rename them to dma_map_buffer/dma_unmap_buffer.
In short, I imagine an api like this (neglecting the actual buffer sharing part and reference counting boiler plate):
struct dma_buf * dma_buf_create(size_t size); int dma_buf_attach_device(struct dma_buf *buf, struct device *device); /* might return -EBUSY to signal that the backing storage is * already allocated and incompatible with the device's * requirements */ struct scatterlist *dma_buf_map(struct dma_buf *buf, struct device *device, int *ents); /* not allowed without a preceeding attach_device */ void dma_buf_unmap(struct dma_buf *buf, struct device *device, struct scatterlist *sg, int ents);
Yours, Daniel
On Thu, Sep 1, 2011 at 8:19 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Aug 11, 2011 at 05:02:21AM -0500, Clark, Rob wrote:
On Wed, Aug 10, 2011 at 7:11 AM, Sumit Semwal sumit.semwal@ti.com wrote:
+/**
- struct dma_buf_ops - operations possible on struct dmabuf
- @get_scatterlist: returns list of scatter pages allocated, increases
- usecount of the buffer
- @put_scatterlist: decreases usecount of buffer, might deallocate scatter
- pages
- @mmap: map this buffer
- @read: read from this buffer
- @write: write to this buffer
- @release: release this buffer; to be called after the last dma_buf_put
- @sync_sg_for_cpu: sync the sg list for cpu
- @sync_sg_for_device: synch the sg list for device
- */
+struct dma_buf_ops {
- /* allow buffer to not be pinned when DMA is not happening */
- struct scatterlist * (*get_scatterlist)(struct dma_buf *);
- void (*put_scatterlist)(struct dma_buf *, struct scatterlist *);
One thought, {get,put}_scatterlist() gives the allocating driver half of what it needs to implement a sync object.. ie. if receiving driver does a "get" before DMA, and "put" on completion, this tells the allocating driver that the buffer is in use. If we added an 'op' to get_scatterlist which was READ &/| WRITE this would even tell the allocator the nature of the DMA which could later be used to implement a wait(op).
Another thought.. at the risk of re-opening a can of worms.. an 'attrib' flag/mask as a param to get_scatterlist() could allow the allocator to defer backing the buffer w/ real memory until the first get_scatterlist() call, for now attrib flag could be discontig, contig, and weird (as Hans suggested) and we can try to define more precisely "weird" later but for now can have platform specific meaning perhaps. This gives a sort of "poor mans negotiation" which might help on some platforms for now..
You've probably discussed this all at the recent summit I've totally missed, but why not add a struct device * pointer argument to get_scatterlist? This way we could shovel all the platform specific complexity into platform specific code.
Ie. add in struct device_dma_parameters?
Well, I guess we still need a way to describe "weird" requirements.. but I guess having the right place to describe them is a good start.
For devices that have different dma requirements depending upon function (i.e. contigous for scanout, iommu-mapped with large pages for overlay image) we could use a bunch of subdevices to map all the different requirements.
Another similar idea could also solve the backing storage negotiation problem: A dma_buf object is at first just an abstract handle (with a fixed size), then you can attach users with dma_buf_attach_device. Then, after all users have themselves attached, the first get_scatterlist call actually allocates the backing storage for the buffer. A subsequent attach_device would then return -EBUSY.
The tricky thing is that we can't rely on all the devices seeing the buffer before the buffer starts getting used. Think video record app.. user launches app and it starts in preview mode. Then they select which format they want to encode to (ex. mpeg4 vs h264). Then they hit 'record'. And finally *now* it starts passing buffers to the encoder, the same buffers that were already circulating between the display and camera.
Well.. it is a hypothetical issue.. if the camera already has the same requirements as the encoder on some platform, then this wouldn't be a problem. So may or may not need to be solved in the first version. Certainly deferring allocating of backing pages helps. (It was also an idea Arnd had.)
BR, -R
This way we don't have a special device that allocates a buffer but instead all the knowledge about dma buffer handling sits in the platform specific dma code. With that design get_scatterlist should imo then also return an already mapped sg list, i.e. what dma_map_sg would return. Probably rename them to dma_map_buffer/dma_unmap_buffer.
In short, I imagine an api like this (neglecting the actual buffer sharing part and reference counting boiler plate):
struct dma_buf * dma_buf_create(size_t size); int dma_buf_attach_device(struct dma_buf *buf, struct device *device); /* might return -EBUSY to signal that the backing storage is * already allocated and incompatible with the device's * requirements */ struct scatterlist *dma_buf_map(struct dma_buf *buf, struct device *device, int *ents); /* not allowed without a preceeding attach_device */ void dma_buf_unmap(struct dma_buf *buf, struct device *device, struct scatterlist *sg, int ents);
Yours, Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
On Thu, Sep 01, 2011 at 01:11:51PM -0500, Clark, Rob wrote:
You've probably discussed this all at the recent summit I've totally missed, but why not add a struct device * pointer argument to get_scatterlist? This way we could shovel all the platform specific complexity into platform specific code.
Ie. add in struct device_dma_parameters?
Well, I guess we still need a way to describe "weird" requirements.. but I guess having the right place to describe them is a good start.
I think that struct device is the right place because the iommu code already needs it there, so why not reuse it for the shared buffer layout arbitrage? At least that was my thinking.
For devices that have different dma requirements depending upon function (i.e. contigous for scanout, iommu-mapped with large pages for overlay image) we could use a bunch of subdevices to map all the different requirements.
Another similar idea could also solve the backing storage negotiation problem: A dma_buf object is at first just an abstract handle (with a fixed size), then you can attach users with dma_buf_attach_device. Then, after all users have themselves attached, the first get_scatterlist call actually allocates the backing storage for the buffer. A subsequent attach_device would then return -EBUSY.
The tricky thing is that we can't rely on all the devices seeing the buffer before the buffer starts getting used. Think video record app.. user launches app and it starts in preview mode. Then they select which format they want to encode to (ex. mpeg4 vs h264). Then they hit 'record'. And finally *now* it starts passing buffers to the encoder, the same buffers that were already circulating between the display and camera.
Well.. it is a hypothetical issue.. if the camera already has the same requirements as the encoder on some platform, then this wouldn't be a problem. So may or may not need to be solved in the first version. Certainly deferring allocating of backing pages helps. (It was also an idea Arnd had.)
I see two ways out of this: - Either drop the current set of buffers and create a new set that can also be shared with the encoder. - Or attach all devices that might ever need to touch that buffer before the first use. But if we go with my 2-step scheme, that's strictly a userspace issue: In both cases we need to export a new attach method in every subsystem that uses shared buffers (the dma_buf_map is probably implicit on first real usage).
Imho the appeal of this is that there's no preferred party of a shared buffer (the one the buffer originated from and allocated it), but that is handled by the dma core (and some platform specific magic for really weird cases).
We could even go so far and make the dma_buf creation a real syscall instead of shoving it into some random ioctls.
Cheers, Daniel
On Fri, Sep 2, 2011 at 5:24 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 01, 2011 at 01:11:51PM -0500, Clark, Rob wrote:
You've probably discussed this all at the recent summit I've totally missed, but why not add a struct device * pointer argument to get_scatterlist? This way we could shovel all the platform specific complexity into platform specific code.
Ie. add in struct device_dma_parameters?
Well, I guess we still need a way to describe "weird" requirements.. but I guess having the right place to describe them is a good start.
I think that struct device is the right place because the iommu code already needs it there, so why not reuse it for the shared buffer layout arbitrage? At least that was my thinking.
yup, seams like a sane approach
For devices that have different dma requirements depending upon function (i.e. contigous for scanout, iommu-mapped with large pages for overlay image) we could use a bunch of subdevices to map all the different requirements.
Another similar idea could also solve the backing storage negotiation problem: A dma_buf object is at first just an abstract handle (with a fixed size), then you can attach users with dma_buf_attach_device. Then, after all users have themselves attached, the first get_scatterlist call actually allocates the backing storage for the buffer. A subsequent attach_device would then return -EBUSY.
The tricky thing is that we can't rely on all the devices seeing the buffer before the buffer starts getting used. Think video record app.. user launches app and it starts in preview mode. Then they select which format they want to encode to (ex. mpeg4 vs h264). Then they hit 'record'. And finally *now* it starts passing buffers to the encoder, the same buffers that were already circulating between the display and camera.
Well.. it is a hypothetical issue.. if the camera already has the same requirements as the encoder on some platform, then this wouldn't be a problem. So may or may not need to be solved in the first version. Certainly deferring allocating of backing pages helps. (It was also an idea Arnd had.)
I see two ways out of this:
- Either drop the current set of buffers and create a new set that can
also be shared with the encoder.
- Or attach all devices that might ever need to touch that buffer before
the first use. But if we go with my 2-step scheme, that's strictly a userspace issue: In both cases we need to export a new attach method in every subsystem that uses shared buffers (the dma_buf_map is probably implicit on first real usage).
But is it a solvable userspace problem? w/ something like GStreamer you have a lot of flexibility for dynamic pipeline changes. AFAIK apps like cheese utilize this. Ideally we wouldn't suddenly introduce new restrictions that make things break w/ hw accel path, compared to how userspace developers are developing w/ sw based encoders.
That said, I guess it is likely that different hw encoders would have same restrictions. So the 2-step approach seems like a pragmatic solution that would handle most cases, and maybe the cases that it doesn't handle are only hypothetical.
Imho the appeal of this is that there's no preferred party of a shared buffer (the one the buffer originated from and allocated it), but that is handled by the dma core (and some platform specific magic for really weird cases).
We could even go so far and make the dma_buf creation a real syscall instead of shoving it into some random ioctls.
hmm, I don't quite get why making it a syscall would help..
BR, -R
Cheers, Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
On Fri, Sep 2, 2011 at 10:21 AM, Clark, Rob rob@ti.com wrote:
On Fri, Sep 2, 2011 at 5:24 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 01, 2011 at 01:11:51PM -0500, Clark, Rob wrote:
You've probably discussed this all at the recent summit I've totally missed, but why not add a struct device * pointer argument to get_scatterlist? This way we could shovel all the platform specific complexity into platform specific code.
Ie. add in struct device_dma_parameters?
Well, I guess we still need a way to describe "weird" requirements.. but I guess having the right place to describe them is a good start.
I think that struct device is the right place because the iommu code already needs it there, so why not reuse it for the shared buffer layout arbitrage? At least that was my thinking.
yup, seams like a sane approach
For devices that have different dma requirements depending upon function (i.e. contigous for scanout, iommu-mapped with large pages for overlay image) we could use a bunch of subdevices to map all the different requirements.
Another similar idea could also solve the backing storage negotiation problem: A dma_buf object is at first just an abstract handle (with a fixed size), then you can attach users with dma_buf_attach_device. Then, after all users have themselves attached, the first get_scatterlist call actually allocates the backing storage for the buffer. A subsequent attach_device would then return -EBUSY.
The tricky thing is that we can't rely on all the devices seeing the buffer before the buffer starts getting used. Think video record app.. user launches app and it starts in preview mode. Then they select which format they want to encode to (ex. mpeg4 vs h264). Then they hit 'record'. And finally *now* it starts passing buffers to the encoder, the same buffers that were already circulating between the display and camera.
Well.. it is a hypothetical issue.. if the camera already has the same requirements as the encoder on some platform, then this wouldn't be a problem. So may or may not need to be solved in the first version. Certainly deferring allocating of backing pages helps. (It was also an idea Arnd had.)
I see two ways out of this:
- Either drop the current set of buffers and create a new set that can
also be shared with the encoder.
- Or attach all devices that might ever need to touch that buffer before
the first use. But if we go with my 2-step scheme, that's strictly a userspace issue: In both cases we need to export a new attach method in every subsystem that uses shared buffers (the dma_buf_map is probably implicit on first real usage).
But is it a solvable userspace problem? w/ something like GStreamer you have a lot of flexibility for dynamic pipeline changes. AFAIK apps like cheese utilize this. Ideally we wouldn't suddenly introduce new restrictions that make things break w/ hw accel path, compared to how userspace developers are developing w/ sw based encoders.
That said, I guess it is likely that different hw encoders would have same restrictions. So the 2-step approach seems like a pragmatic solution that would handle most cases, and maybe the cases that it doesn't handle are only hypothetical.
oh, and if the importing driver is bracketing DMA to/from the buffer w/ get/put scatter list, it should even be possible to implement more fancy reallocation or migration stuff later if it is actually needed, without changing dmabuf API. Or at least this is my theory.
The restriction is that get_sglist needs to be called only from a context that could block.
(And maybe needs to return some hint to the importing driver that the buffer location has/hasn't changed? I'm not sure.. if we are returning device addresses thanks to iommu and having passed the 'struct device' ptr to get_sglist(), then all the optimizations of caching the mapping of the buffer is all behind the scenes?)
BR, -R
Imho the appeal of this is that there's no preferred party of a shared buffer (the one the buffer originated from and allocated it), but that is handled by the dma core (and some platform specific magic for really weird cases).
We could even go so far and make the dma_buf creation a real syscall instead of shoving it into some random ioctls.
hmm, I don't quite get why making it a syscall would help..
BR, -R
Cheers, Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
On Fri, Sep 02, 2011 at 10:49:44AM -0500, Clark, Rob wrote:
oh, and if the importing driver is bracketing DMA to/from the buffer w/ get/put scatter list, it should even be possible to implement more fancy reallocation or migration stuff later if it is actually needed, without changing dmabuf API. Or at least this is my theory.
The restriction is that get_sglist needs to be called only from a context that could block.
(And maybe needs to return some hint to the importing driver that the buffer location has/hasn't changed? I'm not sure.. if we are returning device addresses thanks to iommu and having passed the 'struct device' ptr to get_sglist(), then all the optimizations of caching the mapping of the buffer is all behind the scenes?)
Yeah, that's the idea. It's only a small step from the 2-step api to something with (implicit) sync objects. The easieast way would be to add a flag to get_scatterlist to tell the core that the corresponding put_scatterlist will happen in a short time. And another flag for attach_device to wait for oustanding dma so that the buffer can be re-allocated. It would obviously still return -EBUSY if there are valid mappings with indefinit lifetime out there. -Daniel
On Friday 02 September 2011, Clark, Rob wrote:
Imho the appeal of this is that there's no preferred party of a shared buffer (the one the buffer originated from and allocated it), but that is handled by the dma core (and some platform specific magic for really weird cases).
We could even go so far and make the dma_buf creation a real syscall instead of shoving it into some random ioctls.
hmm, I don't quite get why making it a syscall would help..
It was indeed one of the main drivers for the current design to have no specific way to create a dma buffer but to let every subsystem handle it in its own way. That doesn't prevent you from adding a chardev, file system or syscall that only has the purpose of creating dma buffers, but it should not be essential to have that.
Arnd
On Fri, Sep 02, 2011 at 05:51:37PM +0200, Arnd Bergmann wrote:
On Friday 02 September 2011, Clark, Rob wrote:
Imho the appeal of this is that there's no preferred party of a shared buffer (the one the buffer originated from and allocated it), but that is handled by the dma core (and some platform specific magic for really weird cases).
We could even go so far and make the dma_buf creation a real syscall instead of shoving it into some random ioctls.
hmm, I don't quite get why making it a syscall would help..
It was indeed one of the main drivers for the current design to have no specific way to create a dma buffer but to let every subsystem handle it in its own way. That doesn't prevent you from adding a chardev, file system or syscall that only has the purpose of creating dma buffers, but it should not be essential to have that.
iirc we've converged on that design because it's simpler and requires fewer changes in exisiting subsystems. But thinking more about this I'm not sure anymore whether this is a good trade-off if we want to handle the buffer negotiation problem. Imo that needs a priviledge/central party for the buffer creation.
We certainly don't want to implement all that complexity right away, but should keep it in mind when designing the userspace api. E.g. the central allocator could easily (kernel-internally) fall back on the currently discussed scheme by simply allocating the buffer on the first attach_device (which whould happen through a subsystem specific ioctl). -Daniel
Hi Daniel,
On 2 September 2011 21:48, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 02, 2011 at 05:51:37PM +0200, Arnd Bergmann wrote:
On Friday 02 September 2011, Clark, Rob wrote:
Imho the appeal of this is that there's no preferred party of a
shared
buffer (the one the buffer originated from and allocated it), but
that is
handled by the dma core (and some platform specific magic for really
weird
cases).
We could even go so far and make the dma_buf creation a real syscall instead of shoving it into some random ioctls.
hmm, I don't quite get why making it a syscall would help..
It was indeed one of the main drivers for the current design to have no specific way to create a dma buffer but to let every subsystem handle it in its own way. That doesn't prevent you from adding a chardev, file system or syscall that only has the purpose of creating dma buffers, but it should not be essential to have that.
iirc we've converged on that design because it's simpler and requires fewer changes in exisiting subsystems. But thinking more about this I'm not sure anymore whether this is a good trade-off if we want to handle the buffer negotiation problem. Imo that needs a priviledge/central party for the buffer creation.
We certainly don't want to implement all that complexity right away, but should keep it in mind when designing the userspace api. E.g. the central allocator could easily (kernel-internally) fall back on the currently discussed scheme by simply allocating the buffer on the first attach_device (which whould happen through a subsystem specific ioctl).
Ok, so do you think the following as dma_buf_ops would be ok? - create, - attach_device, - rename {get, put}_scatterlist to "buffer_{map, unmap}, adding struct device* Obviously, the 'release' callback should then take care of doing all book-keeping related to 'de-attaching' struct device* as well.
Then we can have a 'central allocator' device which will use this framework to allow negotiation and sync'ing. Also in absence of this central allocator device, one of the subsystems can take up the role of allocator in the earlier-suggested way?
-Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
--
Thanks and regards,
Sumit Semwal
Linaro Kernel Engineer - Graphics working group
Linaro.org http://www.linaro.org/* **│ *Open source software for ARM SoCs* ***
Follow *Linaro: *Facebook http://www.facebook.com/pages/Linaro | Twitterhttp://twitter.com/#!/linaroorg | Blog http://www.linaro.org/linaro-blog/
On Thu, Sep 8, 2011 at 4:57 AM, Sumit Semwal sumit.semwal@linaro.org wrote:
Hi Daniel,
On 2 September 2011 21:48, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 02, 2011 at 05:51:37PM +0200, Arnd Bergmann wrote:
On Friday 02 September 2011, Clark, Rob wrote:
Imho the appeal of this is that there's no preferred party of a shared buffer (the one the buffer originated from and allocated it), but that is handled by the dma core (and some platform specific magic for really weird cases).
We could even go so far and make the dma_buf creation a real syscall instead of shoving it into some random ioctls.
hmm, I don't quite get why making it a syscall would help..
It was indeed one of the main drivers for the current design to have no specific way to create a dma buffer but to let every subsystem handle it in its own way. That doesn't prevent you from adding a chardev, file system or syscall that only has the purpose of creating dma buffers, but it should not be essential to have that.
iirc we've converged on that design because it's simpler and requires fewer changes in exisiting subsystems. But thinking more about this I'm not sure anymore whether this is a good trade-off if we want to handle the buffer negotiation problem. Imo that needs a priviledge/central party for the buffer creation.
We certainly don't want to implement all that complexity right away, but should keep it in mind when designing the userspace api. E.g. the central allocator could easily (kernel-internally) fall back on the currently discussed scheme by simply allocating the buffer on the first attach_device (which whould happen through a subsystem specific ioctl).
Ok, so do you think the following as dma_buf_ops would be ok? - create, - attach_device, - rename {get, put}_scatterlist to "buffer_{map, unmap}, adding struct device*
actually, all I think we needed was to add 'struct device *' to get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely sure why we need create/attach_device? The create fxn especially seems wrong.. what if you are the 2nd device to import the buffer?
All the discussion about deferring allocation of backing pages could still be handled in the first get_scatterlist() call.
I think the interesting questions are: 1) what to add to 'struct device_dma_parameters' 2) how to handle things like "I need luma over here, chroma over there".. maybe this could be handled by making luma and chroma two buffers, and having some sub-devices?
Obviously, the 'release' callback should then take care of doing all book-keeping related to 'de-attaching' struct device* as well. Then we can have a 'central allocator' device which will use this framework to allow negotiation and sync'ing. Also in absence of this central allocator device, one of the subsystems can take up the role of allocator in the earlier-suggested way?
central allocator or not, I don't think that the importing device should know or care.. *who* allocates and *who* imports, doesn't really seem relevant to me for dmabuf API..
BR, -R
-Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
--
Thanks and regards,
Sumit Semwal
Linaro Kernel Engineer - Graphics working group
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Hi Rob,
On 8 September 2011 19:48, Clark, Rob rob@ti.com wrote:
On Thu, Sep 8, 2011 at 4:57 AM, Sumit Semwal sumit.semwal@linaro.org wrote:
Hi Daniel,
On 2 September 2011 21:48, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 02, 2011 at 05:51:37PM +0200, Arnd Bergmann wrote:
On Friday 02 September 2011, Clark, Rob wrote:
Imho the appeal of this is that there's no preferred party of a
shared
buffer (the one the buffer originated from and allocated it), but
that is
handled by the dma core (and some platform specific magic for
really weird
cases).
We could even go so far and make the dma_buf creation a real
syscall
instead of shoving it into some random ioctls.
hmm, I don't quite get why making it a syscall would help..
It was indeed one of the main drivers for the current design to have
no
specific way to create a dma buffer but to let every subsystem handle it in its own way. That doesn't prevent you from adding a chardev,
file
system or syscall that only has the purpose of creating dma buffers, but it should not be essential to have that.
iirc we've converged on that design because it's simpler and requires fewer changes in exisiting subsystems. But thinking more about this I'm not sure anymore whether this is a good trade-off if we want to handle
the
buffer negotiation problem. Imo that needs a priviledge/central party
for
the buffer creation.
We certainly don't want to implement all that complexity right away, but should keep it in mind when designing the userspace api. E.g. the
central
allocator could easily (kernel-internally) fall back on the currently discussed scheme by simply allocating the buffer on the first attach_device (which whould happen through a subsystem specific ioctl).
Ok, so do you think the following as dma_buf_ops would be ok?
- create,
- attach_device,
- rename {get, put}_scatterlist to "buffer_{map, unmap}, adding struct
device*
actually, all I think we needed was to add 'struct device *' to get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely sure why we need create/attach_device? The create fxn especially seems wrong.. what if you are the 2nd device to import the buffer?
Ah... I guess I misunderstood your 'it seems like a sane approach' to mean you agree with the 2-step process suggested by Daniel :) - my bad!
Since I'm a newbie to this domain, I thought both of you saw some scenario I couldn't. Now when I read your email again, I realised you were talking of 'not having to change the dmabuf API', so well, apologies for mis-reading.
All the discussion about deferring allocation of backing pages could still be handled in the first get_scatterlist() call.
I think the interesting questions are:
- what to add to 'struct device_dma_parameters'
- how to handle things like "I need luma over here, chroma over
there".. maybe this could be handled by making luma and chroma two buffers, and having some sub-devices?
Obviously, the 'release' callback should then take care of doing all
book-keeping related to 'de-attaching' struct device* as well.
Then we can have a 'central allocator' device which will use this
framework to allow negotiation and sync'ing. Also in absence of this central allocator device, one of the subsystems can take up the role of allocator in the earlier-suggested way?
central allocator or not, I don't think that the importing device should know or care.. *who* allocates and *who* imports, doesn't really seem relevant to me for dmabuf API..
Ok, you're right of course. The importer need not know at all on who's the allocator, and vice-versa, in context of dmabuf API, and that's what we started with.
BR, -R
-Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
--
On Thu, Sep 08, 2011 at 08:45:43PM +0530, Sumit Semwal wrote:
On 8 September 2011 19:48, Clark, Rob rob@ti.com wrote:
actually, all I think we needed was to add 'struct device *' to get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely sure why we need create/attach_device? The create fxn especially seems wrong.. what if you are the 2nd device to import the buffer?
Ah... I guess I misunderstood your 'it seems like a sane approach' to mean you agree with the 2-step process suggested by Daniel :) - my bad!
On re-reading Rob's response I think there's a misunderstanding. I don't want a create function in the dma_buf_ops. The create would be a generic dma_buf subsystem function that (like in the patch) allocates the struct and properly registers the fd and what not else. The attach_device would then be the only new function in dma_buf_ops.
The confusion probably stems from my other proposal to hide these function pointers behind small inline helpers for the in-kernel api. -Daniel
On Thu, Sep 8, 2011 at 10:55 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 08:45:43PM +0530, Sumit Semwal wrote:
On 8 September 2011 19:48, Clark, Rob rob@ti.com wrote:
actually, all I think we needed was to add 'struct device *' to get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely sure why we need create/attach_device? The create fxn especially seems wrong.. what if you are the 2nd device to import the buffer?
Ah... I guess I misunderstood your 'it seems like a sane approach' to mean you agree with the 2-step process suggested by Daniel :) - my bad!
On re-reading Rob's response I think there's a misunderstanding. I don't want a create function in the dma_buf_ops. The create would be a generic dma_buf subsystem function that (like in the patch) allocates the struct and properly registers the fd and what not else. The attach_device would then be the only new function in dma_buf_ops.
The confusion probably stems from my other proposal to hide these function pointers behind small inline helpers for the in-kernel api.
I guess/assume that what you are trying to get to is a way to differentiate between "I am done DMA'ing to the buffer now, but will likely use it again (ie. feel free to cache the mapping)", and "I am done forever w/ the buffer (don't bother caching)"?
BR, -R
-Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
On Thu, Sep 08, 2011 at 12:39:41PM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 10:55 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 08:45:43PM +0530, Sumit Semwal wrote:
On 8 September 2011 19:48, Clark, Rob rob@ti.com wrote:
actually, all I think we needed was to add 'struct device *' to get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely sure why we need create/attach_device? The create fxn especially seems wrong.. what if you are the 2nd device to import the buffer?
Ah... I guess I misunderstood your 'it seems like a sane approach' to mean you agree with the 2-step process suggested by Daniel :) - my bad!
On re-reading Rob's response I think there's a misunderstanding. I don't want a create function in the dma_buf_ops. The create would be a generic dma_buf subsystem function that (like in the patch) allocates the struct and properly registers the fd and what not else. The attach_device would then be the only new function in dma_buf_ops.
The confusion probably stems from my other proposal to hide these function pointers behind small inline helpers for the in-kernel api.
I guess/assume that what you are trying to get to is a way to differentiate between "I am done DMA'ing to the buffer now, but will likely use it again (ie. feel free to cache the mapping)", and "I am done forever w/ the buffer (don't bother caching)"?
Hm, no, I haven't really thought much about the release side. dri2/gem doesn't have any automatic caching of shared buffers, btw, that's all done manually (i.e. in the higher levels like the mesa/ddx drivers) by keeping track of recently shared buffers (and hanging onto them for as long as possible without creating havoc on the other side). -Daniel
On Thu, Sep 8, 2011 at 3:28 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 12:39:41PM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 10:55 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 08:45:43PM +0530, Sumit Semwal wrote:
On 8 September 2011 19:48, Clark, Rob rob@ti.com wrote:
actually, all I think we needed was to add 'struct device *' to get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely sure why we need create/attach_device? The create fxn especially seems wrong.. what if you are the 2nd device to import the buffer?
Ah... I guess I misunderstood your 'it seems like a sane approach' to mean you agree with the 2-step process suggested by Daniel :) - my bad!
On re-reading Rob's response I think there's a misunderstanding. I don't want a create function in the dma_buf_ops. The create would be a generic dma_buf subsystem function that (like in the patch) allocates the struct and properly registers the fd and what not else. The attach_device would then be the only new function in dma_buf_ops.
The confusion probably stems from my other proposal to hide these function pointers behind small inline helpers for the in-kernel api.
I guess/assume that what you are trying to get to is a way to differentiate between "I am done DMA'ing to the buffer now, but will likely use it again (ie. feel free to cache the mapping)", and "I am done forever w/ the buffer (don't bother caching)"?
Hm, no, I haven't really thought much about the release side. dri2/gem doesn't have any automatic caching of shared buffers, btw, that's all done manually (i.e. in the higher levels like the mesa/ddx drivers) by keeping track of recently shared buffers (and hanging onto them for as long as possible without creating havoc on the other side).
ok, then I've lost sync.. (ok, I admit to be only having time to read email for 5 min here and there, which might be contributing to the problem)..
what was the purpose of attach() then (which couldn't be accomplished in the first get_scatterlist())?
BR, -R
-Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
On Thu, Sep 08, 2011 at 05:56:19PM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 3:28 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 12:39:41PM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 10:55 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 08:45:43PM +0530, Sumit Semwal wrote:
On 8 September 2011 19:48, Clark, Rob rob@ti.com wrote:
actually, all I think we needed was to add 'struct device *' to get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely sure why we need create/attach_device? The create fxn especially seems wrong.. what if you are the 2nd device to import the buffer?
Ah... I guess I misunderstood your 'it seems like a sane approach' to mean you agree with the 2-step process suggested by Daniel :) - my bad!
On re-reading Rob's response I think there's a misunderstanding. I don't want a create function in the dma_buf_ops. The create would be a generic dma_buf subsystem function that (like in the patch) allocates the struct and properly registers the fd and what not else. The attach_device would then be the only new function in dma_buf_ops.
The confusion probably stems from my other proposal to hide these function pointers behind small inline helpers for the in-kernel api.
I guess/assume that what you are trying to get to is a way to differentiate between "I am done DMA'ing to the buffer now, but will likely use it again (ie. feel free to cache the mapping)", and "I am done forever w/ the buffer (don't bother caching)"?
Hm, no, I haven't really thought much about the release side. dri2/gem doesn't have any automatic caching of shared buffers, btw, that's all done manually (i.e. in the higher levels like the mesa/ddx drivers) by keeping track of recently shared buffers (and hanging onto them for as long as possible without creating havoc on the other side).
ok, then I've lost sync.. (ok, I admit to be only having time to read email for 5 min here and there, which might be contributing to the problem)..
what was the purpose of attach() then (which couldn't be accomplished in the first get_scatterlist())?
Ok, I'll try to explain with this a use-case we've discussed: A video input device, some yuv plane scanout engine and an mpeg encoder, i.e. the video-shooting usecase. mpeg encoder is optional and only engaged when the user presse the record button.
Now for live-view, userspaces grabs some dma_buf ids (i.e. fds) from the v4l device and then creates a bunch of yuv planes at the scanout device with these. For this case you're right that we could arbitrage the memory layout at the get_scatterlist call of the scanout device (the other user, the frame-grabber, is already known because the dma_buf orginitated there).
But if we throw a third device into the mix, this won't work. Somewhen (and it doesn't matter how much we delay this till the complete pipeline is set up), either the the scanout device or the encoder has to call get_scatterlist before the other one. And as soon is the allocator (respectively the originator) has handed out a scatterlist (either in system address space or device address space) it cannot move the backing storage anymore. If the other device then calls get_scatterlist and has more stringet buffer placement requirements, things stop working.
Now my idea is that when userspace establishes the shared buffer (i.e. subsystems convert the dma_buf fd number into a subsystem specific id), the device taking part in this buffer sharing only calls attach_device, but not get_scatterlist.
After this has taken place for all devices, userspace starts the pipelined. Kernel drivers only then call get_scatterlist and so force the allocation of the backing storage. And by then all participating devices are known, so the right placement can be chosen for the buffer.
So the reason for attach_device is to make sharing between 3 devices reliably possible (without userspace needing to know in which sequence to share buffers with devices). For 2 devices, attach_device isn't really necessary because the first device does an implicit attach_device when creating the buffer - it better not forget it's own memory placement requirements ;-) So for just 2 devices delaying the allocation up to the call to get_scatterlist is good enough.
Cheers, Daniel
On Fri, Sep 9, 2011 at 1:43 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 05:56:19PM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 3:28 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 12:39:41PM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 10:55 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 08:45:43PM +0530, Sumit Semwal wrote:
On 8 September 2011 19:48, Clark, Rob rob@ti.com wrote: > actually, all I think we needed was to add 'struct device *' to > get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely > sure why we need create/attach_device? The create fxn especially > seems wrong.. what if you are the 2nd device to import the buffer? > Ah... I guess I misunderstood your 'it seems like a sane approach' to mean you agree with the 2-step process suggested by Daniel :) - my bad!
On re-reading Rob's response I think there's a misunderstanding. I don't want a create function in the dma_buf_ops. The create would be a generic dma_buf subsystem function that (like in the patch) allocates the struct and properly registers the fd and what not else. The attach_device would then be the only new function in dma_buf_ops.
The confusion probably stems from my other proposal to hide these function pointers behind small inline helpers for the in-kernel api.
I guess/assume that what you are trying to get to is a way to differentiate between "I am done DMA'ing to the buffer now, but will likely use it again (ie. feel free to cache the mapping)", and "I am done forever w/ the buffer (don't bother caching)"?
Hm, no, I haven't really thought much about the release side. dri2/gem doesn't have any automatic caching of shared buffers, btw, that's all done manually (i.e. in the higher levels like the mesa/ddx drivers) by keeping track of recently shared buffers (and hanging onto them for as long as possible without creating havoc on the other side).
ok, then I've lost sync.. (ok, I admit to be only having time to read email for 5 min here and there, which might be contributing to the problem)..
what was the purpose of attach() then (which couldn't be accomplished in the first get_scatterlist())?
Ok, I'll try to explain with this a use-case we've discussed: A video input device, some yuv plane scanout engine and an mpeg encoder, i.e. the video-shooting usecase. mpeg encoder is optional and only engaged when the user presse the record button.
Now for live-view, userspaces grabs some dma_buf ids (i.e. fds) from the v4l device and then creates a bunch of yuv planes at the scanout device with these. For this case you're right that we could arbitrage the memory layout at the get_scatterlist call of the scanout device (the other user, the frame-grabber, is already known because the dma_buf orginitated there).
But if we throw a third device into the mix, this won't work. Somewhen (and it doesn't matter how much we delay this till the complete pipeline is set up), either the the scanout device or the encoder has to call get_scatterlist before the other one. And as soon is the allocator (respectively the originator) has handed out a scatterlist (either in system address space or device address space) it cannot move the backing storage anymore. If the other device then calls get_scatterlist and has more stringet buffer placement requirements, things stop working.
Now my idea is that when userspace establishes the shared buffer (i.e. subsystems convert the dma_buf fd number into a subsystem specific id), the device taking part in this buffer sharing only calls attach_device, but not get_scatterlist.
After this has taken place for all devices, userspace starts the pipelined. Kernel drivers only then call get_scatterlist and so force the allocation of the backing storage. And by then all participating devices are known, so the right placement can be chosen for the buffer.
My one concern is that, depending one the app/framework above, it may not be possible to know all the involved devices at the start.. or at least this would require more extensive userspace changes, not just userspace framework (like GStreamer) changes..
But how about this.. and I think this approach ensures that there will be no later buffer migration. But in the other case, it ensures that there is at most one migration. (This all assuming the devices can physically share the same buffer.)
----- struct dma_buf_attachment * dma_buf_attach(struct dma_buf *buf, struct device *dev); --> establish intent to use the buffer w/ particular device
for (... # of times buffer is used ...) { struct scatterlist * dma_buf_get_scatterlist(struct dma_buf_attachment *attach); --> indicate begin of DMA, first call might actually back the buffer w/ pages or migrate buffer
void dma_buf_put_scatterlist(struct dma_buf_attachment *attach, struct scatterlist *); --> indication end of DMA }
void dma_buf_detach(struct dma_buf_attachment *attach); --> indicate buffer is no longer to be used by importing device -----
with this sort of approach, if a new device is attached after the first get_scatterlist the buffer can be, if needed, migrated using the union of all the devices requirements at a point in time when no DMA is active to/from the buffer. But if all the devices are known up front, then you never need to migrate unnecessarily.
This way, if userspace is able to ensure all devices are known upfront, everything is great. But if not, things don't explode and you aren't forced to reallocate buffers in userspace or anything ugly like this.
Optionally perhaps attach could be combined w/ dma_buf_get and detach w/ dma_buf_put.. (although then they probably wouldn't return a 'struct dma_buf *'.. not sure if that is weird. But I guess get+attach and detach+put would always go together and this saves an extra call. I don't have (yet) a strong opinion one way or another on that.
BR, -R
So the reason for attach_device is to make sharing between 3 devices reliably possible (without userspace needing to know in which sequence to share buffers with devices). For 2 devices, attach_device isn't really necessary because the first device does an implicit attach_device when creating the buffer - it better not forget it's own memory placement requirements ;-) So for just 2 devices delaying the allocation up to the call to get_scatterlist is good enough.
Cheers, Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
On Fri, Sep 09, 2011 at 06:36:23PM -0500, Clark, Rob wrote:
On Fri, Sep 9, 2011 at 1:43 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 05:56:19PM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 3:28 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 12:39:41PM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 10:55 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 08:45:43PM +0530, Sumit Semwal wrote: > On 8 September 2011 19:48, Clark, Rob rob@ti.com wrote: > > actually, all I think we needed was to add 'struct device *' to > > get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely > > sure why we need create/attach_device? The create fxn especially > > seems wrong.. what if you are the 2nd device to import the buffer? > > > Ah... I guess I misunderstood your 'it seems like a sane approach' to mean > you agree with the 2-step process suggested by Daniel :) - my bad!
On re-reading Rob's response I think there's a misunderstanding. I don't want a create function in the dma_buf_ops. The create would be a generic dma_buf subsystem function that (like in the patch) allocates the struct and properly registers the fd and what not else. The attach_device would then be the only new function in dma_buf_ops.
The confusion probably stems from my other proposal to hide these function pointers behind small inline helpers for the in-kernel api.
I guess/assume that what you are trying to get to is a way to differentiate between "I am done DMA'ing to the buffer now, but will likely use it again (ie. feel free to cache the mapping)", and "I am done forever w/ the buffer (don't bother caching)"?
Hm, no, I haven't really thought much about the release side. dri2/gem doesn't have any automatic caching of shared buffers, btw, that's all done manually (i.e. in the higher levels like the mesa/ddx drivers) by keeping track of recently shared buffers (and hanging onto them for as long as possible without creating havoc on the other side).
ok, then I've lost sync.. (ok, I admit to be only having time to read email for 5 min here and there, which might be contributing to the problem)..
what was the purpose of attach() then (which couldn't be accomplished in the first get_scatterlist())?
Ok, I'll try to explain with this a use-case we've discussed: A video input device, some yuv plane scanout engine and an mpeg encoder, i.e. the video-shooting usecase. mpeg encoder is optional and only engaged when the user presse the record button.
Now for live-view, userspaces grabs some dma_buf ids (i.e. fds) from the v4l device and then creates a bunch of yuv planes at the scanout device with these. For this case you're right that we could arbitrage the memory layout at the get_scatterlist call of the scanout device (the other user, the frame-grabber, is already known because the dma_buf orginitated there).
But if we throw a third device into the mix, this won't work. Somewhen (and it doesn't matter how much we delay this till the complete pipeline is set up), either the the scanout device or the encoder has to call get_scatterlist before the other one. And as soon is the allocator (respectively the originator) has handed out a scatterlist (either in system address space or device address space) it cannot move the backing storage anymore. If the other device then calls get_scatterlist and has more stringet buffer placement requirements, things stop working.
Now my idea is that when userspace establishes the shared buffer (i.e. subsystems convert the dma_buf fd number into a subsystem specific id), the device taking part in this buffer sharing only calls attach_device, but not get_scatterlist.
After this has taken place for all devices, userspace starts the pipelined. Kernel drivers only then call get_scatterlist and so force the allocation of the backing storage. And by then all participating devices are known, so the right placement can be chosen for the buffer.
My one concern is that, depending one the app/framework above, it may not be possible to know all the involved devices at the start.. or at least this would require more extensive userspace changes, not just userspace framework (like GStreamer) changes..
But how about this.. and I think this approach ensures that there will be no later buffer migration. But in the other case, it ensures that there is at most one migration. (This all assuming the devices can physically share the same buffer.)
struct dma_buf_attachment * dma_buf_attach(struct dma_buf *buf, struct device *dev); --> establish intent to use the buffer w/ particular device
for (... # of times buffer is used ...) { struct scatterlist * dma_buf_get_scatterlist(struct dma_buf_attachment *attach); --> indicate begin of DMA, first call might actually back the buffer w/ pages or migrate buffer
void dma_buf_put_scatterlist(struct dma_buf_attachment *attach, struct scatterlist *); --> indication end of DMA }
void dma_buf_detach(struct dma_buf_attachment *attach);
--> indicate buffer is no longer to be used by importing device
I fear we are in violent agreement here ;-) ... this is essentially what I was aiming for, minus the rather nice polish you've added in the form of struct dma_buf_attachment.
with this sort of approach, if a new device is attached after the first get_scatterlist the buffer can be, if needed, migrated using the union of all the devices requirements at a point in time when no DMA is active to/from the buffer. But if all the devices are known up front, then you never need to migrate unnecessarily.
Well, the problem is with devices that hang onto mappings for way too long so just waiting for all dma to finish to be able to fix up the buffer placement is a no-go. But I think we can postpone that issue a bit, especially since the drivers that tend to do this (gpus) can also evict objects nilly-willy, so that should be fixable with some explicit kill_your_mappings callback attached to drm_buf_attachment (or full-blown sync objects a là ttm).
This way, if userspace is able to ensure all devices are known upfront, everything is great. But if not, things don't explode and you aren't forced to reallocate buffers in userspace or anything ugly like this.
Optionally perhaps attach could be combined w/ dma_buf_get and detach w/ dma_buf_put.. (although then they probably wouldn't return a 'struct dma_buf *'.. not sure if that is weird. But I guess get+attach and detach+put would always go together and this saves an extra call. I don't have (yet) a strong opinion one way or another on that.
Hm, maybe a seperate attach makes sense if e.g. the userspace abstraction doesn't align with the kernel struct device layout. E.g. you create a gem object out of a dma_buf (attaching it to the gpu struct device) and later a kms scanout buffer out of the gem object (attaching the same dma_buf to the display subsystem). But for the common case of a simple v4l device or a simple encoder, a combined get+attach and put+detach certainly makes sense.
Cheers, Daniel
On Sat, Sep 10, 2011 at 6:45 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 09, 2011 at 06:36:23PM -0500, Clark, Rob wrote:
On Fri, Sep 9, 2011 at 1:43 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 05:56:19PM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 3:28 PM, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 08, 2011 at 12:39:41PM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 10:55 AM, Daniel Vetter daniel@ffwll.ch wrote: > On Thu, Sep 08, 2011 at 08:45:43PM +0530, Sumit Semwal wrote: >> On 8 September 2011 19:48, Clark, Rob rob@ti.com wrote: >> > actually, all I think we needed was to add 'struct device *' to >> > get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely >> > sure why we need create/attach_device? The create fxn especially >> > seems wrong.. what if you are the 2nd device to import the buffer? >> > >> Ah... I guess I misunderstood your 'it seems like a sane approach' to mean >> you agree with the 2-step process suggested by Daniel :) - my bad! > > On re-reading Rob's response I think there's a misunderstanding. I don't > want a create function in the dma_buf_ops. The create would be a generic > dma_buf subsystem function that (like in the patch) allocates the struct > and properly registers the fd and what not else. The attach_device would > then be the only new function in dma_buf_ops. > > The confusion probably stems from my other proposal to hide these function > pointers behind small inline helpers for the in-kernel api.
I guess/assume that what you are trying to get to is a way to differentiate between "I am done DMA'ing to the buffer now, but will likely use it again (ie. feel free to cache the mapping)", and "I am done forever w/ the buffer (don't bother caching)"?
Hm, no, I haven't really thought much about the release side. dri2/gem doesn't have any automatic caching of shared buffers, btw, that's all done manually (i.e. in the higher levels like the mesa/ddx drivers) by keeping track of recently shared buffers (and hanging onto them for as long as possible without creating havoc on the other side).
ok, then I've lost sync.. (ok, I admit to be only having time to read email for 5 min here and there, which might be contributing to the problem)..
what was the purpose of attach() then (which couldn't be accomplished in the first get_scatterlist())?
Ok, I'll try to explain with this a use-case we've discussed: A video input device, some yuv plane scanout engine and an mpeg encoder, i.e. the video-shooting usecase. mpeg encoder is optional and only engaged when the user presse the record button.
Now for live-view, userspaces grabs some dma_buf ids (i.e. fds) from the v4l device and then creates a bunch of yuv planes at the scanout device with these. For this case you're right that we could arbitrage the memory layout at the get_scatterlist call of the scanout device (the other user, the frame-grabber, is already known because the dma_buf orginitated there).
But if we throw a third device into the mix, this won't work. Somewhen (and it doesn't matter how much we delay this till the complete pipeline is set up), either the the scanout device or the encoder has to call get_scatterlist before the other one. And as soon is the allocator (respectively the originator) has handed out a scatterlist (either in system address space or device address space) it cannot move the backing storage anymore. If the other device then calls get_scatterlist and has more stringet buffer placement requirements, things stop working.
Now my idea is that when userspace establishes the shared buffer (i.e. subsystems convert the dma_buf fd number into a subsystem specific id), the device taking part in this buffer sharing only calls attach_device, but not get_scatterlist.
After this has taken place for all devices, userspace starts the pipelined. Kernel drivers only then call get_scatterlist and so force the allocation of the backing storage. And by then all participating devices are known, so the right placement can be chosen for the buffer.
My one concern is that, depending one the app/framework above, it may not be possible to know all the involved devices at the start.. or at least this would require more extensive userspace changes, not just userspace framework (like GStreamer) changes..
But how about this.. and I think this approach ensures that there will be no later buffer migration. But in the other case, it ensures that there is at most one migration. (This all assuming the devices can physically share the same buffer.)
struct dma_buf_attachment * dma_buf_attach(struct dma_buf *buf, struct device *dev); --> establish intent to use the buffer w/ particular device
for (... # of times buffer is used ...) { struct scatterlist * dma_buf_get_scatterlist(struct dma_buf_attachment *attach); --> indicate begin of DMA, first call might actually back the buffer w/ pages or migrate buffer
void dma_buf_put_scatterlist(struct dma_buf_attachment *attach, struct scatterlist *); --> indication end of DMA }
void dma_buf_detach(struct dma_buf_attachment *attach); --> indicate buffer is no longer to be used by importing device
I fear we are in violent agreement here ;-) ...
oh, damn, that is what I was afraid of.. j/k ;-)
this is essentially what I was aiming for, minus the rather nice polish you've added in the form of struct dma_buf_attachment.
with this sort of approach, if a new device is attached after the first get_scatterlist the buffer can be, if needed, migrated using the union of all the devices requirements at a point in time when no DMA is active to/from the buffer. But if all the devices are known up front, then you never need to migrate unnecessarily.
Well, the problem is with devices that hang onto mappings for way too long so just waiting for all dma to finish to be able to fix up the buffer placement is a no-go. But I think we can postpone that issue a bit, especially since the drivers that tend to do this (gpus) can also evict objects nilly-willy, so that should be fixable with some explicit kill_your_mappings callback attached to drm_buf_attachment (or full-blown sync objects a là ttm).
I'm ok if the weird fallback cases aren't fast.. I just don't want things to explode catastrophically in weird cases.
I guess in the GPU / deep pipeline case, you can at least set up to get an interrupt back when the GPU is done with some surface (ie. when it gets to a certain point in the command-stream)? I think it is ok if things stall in this case until the GPU pipeline is drained (and if you are targeting 60fps, that is probably still faster than video, likely at 30fps). Again, this is just for the cases where userspace doesn't do what we want, to avoid just complete failure..
If the GPU is the one importing the dmabuf, it just calls put_scatterlist() once it gets some interrupt from the GPU. If the GPU is the one exporting the dmabuf, then get_scatterlist() just blocks until the GPU gets the interrupt from the GPU. (Well, I guess then do you need get_scatterlist_interruptable()?)
This way, if userspace is able to ensure all devices are known upfront, everything is great. But if not, things don't explode and you aren't forced to reallocate buffers in userspace or anything ugly like this.
Optionally perhaps attach could be combined w/ dma_buf_get and detach w/ dma_buf_put.. (although then they probably wouldn't return a 'struct dma_buf *'.. not sure if that is weird. But I guess get+attach and detach+put would always go together and this saves an extra call. I don't have (yet) a strong opinion one way or another on that.
Hm, maybe a seperate attach makes sense if e.g. the userspace abstraction doesn't align with the kernel struct device layout. E.g. you create a gem object out of a dma_buf (attaching it to the gpu struct device) and later a kms scanout buffer out of the gem object (attaching the same dma_buf to the display subsystem). But for the common case of a simple v4l device or a simple encoder, a combined get+attach and put+detach certainly makes sense.
ok, let's keep 'em separate then
BR, -R
Cheers, Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
On Sun, Sep 11, 2011 at 10:32:20AM -0500, Clark, Rob wrote:
On Sat, Sep 10, 2011 at 6:45 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 09, 2011 at 06:36:23PM -0500, Clark, Rob wrote:
with this sort of approach, if a new device is attached after the first get_scatterlist the buffer can be, if needed, migrated using the union of all the devices requirements at a point in time when no DMA is active to/from the buffer. But if all the devices are known up front, then you never need to migrate unnecessarily.
Well, the problem is with devices that hang onto mappings for way too long so just waiting for all dma to finish to be able to fix up the buffer placement is a no-go. But I think we can postpone that issue a bit, especially since the drivers that tend to do this (gpus) can also evict objects nilly-willy, so that should be fixable with some explicit kill_your_mappings callback attached to drm_buf_attachment (or full-blown sync objects a là ttm).
I'm ok if the weird fallback cases aren't fast.. I just don't want things to explode catastrophically in weird cases.
I guess in the GPU / deep pipeline case, you can at least set up to get an interrupt back when the GPU is done with some surface (ie. when it gets to a certain point in the command-stream)? I think it is ok if things stall in this case until the GPU pipeline is drained (and if you are targeting 60fps, that is probably still faster than video, likely at 30fps). Again, this is just for the cases where userspace doesn't do what we want, to avoid just complete failure..
If the GPU is the one importing the dmabuf, it just calls put_scatterlist() once it gets some interrupt from the GPU. If the GPU is the one exporting the dmabuf, then get_scatterlist() just blocks until the GPU gets the interrupt from the GPU. (Well, I guess then do you need get_scatterlist_interruptable()?)
The problem with gpus is that they eat through data so _fast_ that not caching mappings kills performance. Now for simpler gpus we could shovel the mapping code into the dma/dma_buf subsystem and cache things there.
But desktop gpus already have (or will get) support for per-process gpu address spaces and I don't thing it makes sense to put that complexity into generic layers (nor is it imo feasible accross different gpus - per-process stuff tends to highly integrate with command submission). So I think we need some explicit unmap_ASAP callback support, but definitly not for v1 of dma_buf. But with attach separated from get_scatterlist and an explicit struct dma_buf_attachment around, such an extension should be pretty straightforward to implement. -Daniel
Hi All,
On 12 September 2011 19:37, Daniel Vetter daniel@ffwll.ch wrote:
On Sun, Sep 11, 2011 at 10:32:20AM -0500, Clark, Rob wrote:
On Sat, Sep 10, 2011 at 6:45 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 09, 2011 at 06:36:23PM -0500, Clark, Rob wrote:
with this sort of approach, if a new device is attached after the first get_scatterlist the buffer can be, if needed, migrated using the union of all the devices requirements at a point in time when no DMA is active to/from the buffer. But if all the devices are known up front, then you never need to migrate unnecessarily.
Well, the problem is with devices that hang onto mappings for way too
long
so just waiting for all dma to finish to be able to fix up the buffer placement is a no-go. But I think we can postpone that issue a bit, especially since the drivers that tend to do this (gpus) can also evict objects nilly-willy, so that should be fixable with some explicit kill_your_mappings callback attached to drm_buf_attachment (or
full-blown
sync objects a là ttm).
I'm ok if the weird fallback cases aren't fast.. I just don't want things to explode catastrophically in weird cases.
I guess in the GPU / deep pipeline case, you can at least set up to get an interrupt back when the GPU is done with some surface (ie. when it gets to a certain point in the command-stream)? I think it is ok if things stall in this case until the GPU pipeline is drained (and if you are targeting 60fps, that is probably still faster than video, likely at 30fps). Again, this is just for the cases where userspace doesn't do what we want, to avoid just complete failure..
If the GPU is the one importing the dmabuf, it just calls put_scatterlist() once it gets some interrupt from the GPU. If the GPU is the one exporting the dmabuf, then get_scatterlist() just blocks until the GPU gets the interrupt from the GPU. (Well, I guess then do you need get_scatterlist_interruptable()?)
The problem with gpus is that they eat through data so _fast_ that not caching mappings kills performance. Now for simpler gpus we could shovel the mapping code into the dma/dma_buf subsystem and cache things there.
But desktop gpus already have (or will get) support for per-process gpu address spaces and I don't thing it makes sense to put that complexity into generic layers (nor is it imo feasible accross different gpus - per-process stuff tends to highly integrate with command submission). So I think we need some explicit unmap_ASAP callback support, but definitly not for v1 of dma_buf. But with attach separated from get_scatterlist and an explicit struct dma_buf_attachment around, such an extension should be pretty straightforward to implement.
Thanks - and as Rob educated me over IRC #linaro-mm-sig a little while back, we can put the 'list of attachments' in dma_buf, and generic list handling as well centrally. I should be able to post out v2 by tomorrow I think, thanks to you gents! ~me.
-Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
On Mon, Sep 12, 2011 at 07:50:55PM +0530, Sumit Semwal wrote:
Thanks - and as Rob educated me over IRC #linaro-mm-sig a little while back, we can put the 'list of attachments' in dma_buf, and generic list handling as well centrally. I should be able to post out v2 by tomorrow I think, thanks to you gents!
Well, I've thought about that we could track all the dma_buf_attachments in a list in the core dma_buf code, but haven't bothered to mention that ;-) Sounds good what your planning - I'm looking forward to reviewin v2!
Yours, Daniel
On Mon, Sep 12, 2011 at 9:07 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Sun, Sep 11, 2011 at 10:32:20AM -0500, Clark, Rob wrote:
On Sat, Sep 10, 2011 at 6:45 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 09, 2011 at 06:36:23PM -0500, Clark, Rob wrote:
with this sort of approach, if a new device is attached after the first get_scatterlist the buffer can be, if needed, migrated using the union of all the devices requirements at a point in time when no DMA is active to/from the buffer. But if all the devices are known up front, then you never need to migrate unnecessarily.
Well, the problem is with devices that hang onto mappings for way too long so just waiting for all dma to finish to be able to fix up the buffer placement is a no-go. But I think we can postpone that issue a bit, especially since the drivers that tend to do this (gpus) can also evict objects nilly-willy, so that should be fixable with some explicit kill_your_mappings callback attached to drm_buf_attachment (or full-blown sync objects a là ttm).
I'm ok if the weird fallback cases aren't fast.. I just don't want things to explode catastrophically in weird cases.
I guess in the GPU / deep pipeline case, you can at least set up to get an interrupt back when the GPU is done with some surface (ie. when it gets to a certain point in the command-stream)? I think it is ok if things stall in this case until the GPU pipeline is drained (and if you are targeting 60fps, that is probably still faster than video, likely at 30fps). Again, this is just for the cases where userspace doesn't do what we want, to avoid just complete failure..
If the GPU is the one importing the dmabuf, it just calls put_scatterlist() once it gets some interrupt from the GPU. If the GPU is the one exporting the dmabuf, then get_scatterlist() just blocks until the GPU gets the interrupt from the GPU. (Well, I guess then do you need get_scatterlist_interruptable()?)
The problem with gpus is that they eat through data so _fast_ that not caching mappings kills performance. Now for simpler gpus we could shovel the mapping code into the dma/dma_buf subsystem and cache things there.
But desktop gpus already have (or will get) support for per-process gpu address spaces and I don't thing it makes sense to put that complexity into generic layers (nor is it imo feasible accross different gpus - per-process stuff tends to highly integrate with command submission). So I think we need some explicit unmap_ASAP callback support, but definitly not for v1 of dma_buf. But with attach separated from get_scatterlist and an explicit struct dma_buf_attachment around, such an extension should be pretty straightforward to implement.
hmm, I was thinking we could somehow make the GPU MMU look something like a IOMMU and do the caching behind that.. but I guess if you start talking about per-process address spaces and that sort of thing (GPU/command-stream driven context switches), then maybe that starts to stretch the limits of generic interfaces..
OTOH, I think the GPU subsystem would normally be the exporter of the buf.. other subsystems like v4l2 cameras or encoders/decoders that import the dmabuf would be substantially simpler.
I was thinking about whether or not to add a GEM ioctl to import dmabuf's, but I think the main use-case of that would be to re-import dmabuf's that are already backed by a GEM object.
BR, -R
-Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
On 09/11/2011 09:32 AM, Clark, Rob wrote:
If the GPU is the one importing the dmabuf, it just calls put_scatterlist() once it gets some interrupt from the GPU. If the GPU is the one exporting the dmabuf, then get_scatterlist() just blocks until the GPU gets the interrupt from the GPU. (Well, I guess then do you need get_scatterlist_interruptable()?)
Yes. And get_scatterlist_interruptible_timeout() too :)
Jordan
On Thu, Sep 08, 2011 at 09:18:30AM -0500, Clark, Rob wrote:
On Thu, Sep 8, 2011 at 4:57 AM, Sumit Semwal sumit.semwal@linaro.org wrote:
Ok, so do you think the following as dma_buf_ops would be ok? - create, - attach_device, - rename {get, put}_scatterlist to "buffer_{map, unmap}, adding struct device*
actually, all I think we needed was to add 'struct device *' to get_scatterlist (and maybe doesn't hurt for put).. I'm not entirely sure why we need create/attach_device? The create fxn especially seems wrong.. what if you are the 2nd device to import the buffer?
Well, the create would just create the abstract handle, so you can attach devices.
All the discussion about deferring allocation of backing pages could still be handled in the first get_scatterlist() call.
Well, the original patch only has the get_scatterlist in dma_buf_ops. So you'll only know about that one device (and maybe the originator), which is not enough. Hence the attach_device. And after the first get_scatterlist we can't move the backing storage anymore.
Now I think we could always change that (userspace just sees some abstract handle) but I fear that switching (kernel-internally) to a two-step approach will be quite a bit of work after some drivers/subsytems start to use dma_buf.
Something else that got lost a bit in the attach_device discussion: I think we should directly return the scatterlist mapped into the address space of the mapping device (and hence call it map/unmap to keep in line with the existing dma api). That hopefully gives us a bit more leeway to do insane stuff with the backing storage (like device->device dma for e.g. ip blocks that can directly access gpu address space). Hence my proposal to add a struct device * parameter.
I think the interesting questions are:
- what to add to 'struct device_dma_parameters'
I think that would be highly (and imo should be) platform specific, e.g. on ARM a few things cascades like - normal pages -> large pages (due to puny tlbs) -> contigous - cached -> uncached memory - anything else totally insane (like all of mem -> only from one bank)
We could implement this by a chain of allocators with page_alloc at the top and cma_alloc at the bottom. Arbitration could then take all know devices and walk down the allocator list untill it fits.
btw, recent Intel gpus can use large pages (32kb), so that allocator infrastructure could perhaps be shared.
- how to handle things like "I need luma over here, chroma over
there".. maybe this could be handled by making luma and chroma two buffers, and having some sub-devices?
Yeah I think for devices with extremely stringet requirements splitting planar yuv formats into 2 or 3 buffers is the only way to go. If it's getting more extreme I think it's approaching the time to throw in the towel and tell the hw engineers to fix up their mess.
Obviously, the 'release' callback should then take care of doing all book-keeping related to 'de-attaching' struct device* as well. Then we can have a 'central allocator' device which will use this framework to allow negotiation and sync'ing. Also in absence of this central allocator device, one of the subsystems can take up the role of allocator in the earlier-suggested way?
central allocator or not, I don't think that the importing device should know or care.. *who* allocates and *who* imports, doesn't really seem relevant to me for dmabuf API..
Yeah, I agree. As long as we shovel the get_scatterlist through an internal api and so separate the allocator of the backing storage from dma_buf api users, we should be free to change things without too much work.
I think for that I'd be good to have an inline encapsulating the abstract function call, so we can later change it to e.g. a central platfrom-specific allocator. I.e.
struct scatterlist *dma_buf_get_scatterlist(struct dma_buf *) { return dma_buf->ops->get-scatterlist(dma_buf); }
Cheers, Daniel
On Wed, Aug 10, 2011 at 7:11 AM, Sumit Semwal sumit.semwal@ti.com wrote:
+/**
- struct dma_buf - shared buffer object
- @file: file pointer used for sharing buffers across, and for refcounting.
- @ops: dma_buf_ops associated with this buffer object
- @priv: user specific private data
- */
+struct dma_buf {
- struct file *file;
- struct dma_buf_ops *ops;
- void *priv;
+};
hmm, I wonder if we should stuff a ptr in there to store the creating driver (or device?) ptr? This way if it somehow gets passed back to the creating driver, it could realize that it is a dma_buf that it created in the first place.. and that it was safe to cast and deref the priv ptr.
BR, -R
Hi Rob,
On 25 August 2011 06:33, Clark, Rob rob@ti.com wrote:
On Wed, Aug 10, 2011 at 7:11 AM, Sumit Semwal sumit.semwal@ti.com wrote:
+/**
- struct dma_buf - shared buffer object
- @file: file pointer used for sharing buffers across, and for
refcounting.
- @ops: dma_buf_ops associated with this buffer object
- @priv: user specific private data
- */
+struct dma_buf {
struct file *file;
struct dma_buf_ops *ops;
void *priv;
+};
hmm, I wonder if we should stuff a ptr in there to store the creating driver (or device?) ptr? This way if it somehow gets passed back to the creating driver, it could realize that it is a dma_buf that it created in the first place.. and that it was safe to cast and deref the priv ptr.
Hmmm... Do you mean the platform driver (device?) ptr? That's a good idea - I will make these changes you suggested, and re-post. I'm thinking of posting the next version to the upstream mailing lists as well.
Everyone: if you could please spend some time for a review, it'd help me post something without obvious, stupid mistakes to the upstream mailing lists.
Thanks and best regards, ~Sumit.
BR, -R
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On 25 August 2011 10:52, Sumit Semwal sumit.semwal@linaro.org wrote:
Hi Rob,
On 25 August 2011 06:33, Clark, Rob rob@ti.com wrote:
On Wed, Aug 10, 2011 at 7:11 AM, Sumit Semwal sumit.semwal@ti.com wrote:
+/**
- struct dma_buf - shared buffer object
- @file: file pointer used for sharing buffers across, and for
refcounting.
- @ops: dma_buf_ops associated with this buffer object
- @priv: user specific private data
- */
+struct dma_buf {
struct file *file;
struct dma_buf_ops *ops;
void *priv;
+};
hmm, I wonder if we should stuff a ptr in there to store the creating driver (or device?) ptr? This way if it somehow gets passed back to the creating driver, it could realize that it is a dma_buf that it created in the first place.. and that it was safe to cast and deref the priv ptr.
Hmmm... Do you mean the platform driver (device?) ptr? That's a good idea - I will make these changes you suggested, and re-post. I'm thinking of posting the next version to the upstream mailing lists as well.
Btw, does this mean we can 'assume' that all users/allocators of dma-buf would be platform devices?
BR, ~me.
Everyone: if you could please spend some time for a review, it'd help me post something without obvious, stupid mistakes to the upstream mailing lists.
Thanks and best regards, ~Sumit.
BR, -R
On Thursday, August 25, 2011 17:50:14 Sumit Semwal wrote:
On 25 August 2011 10:52, Sumit Semwal sumit.semwal@linaro.org wrote:
Hi Rob,
On 25 August 2011 06:33, Clark, Rob rob@ti.com wrote:
On Wed, Aug 10, 2011 at 7:11 AM, Sumit Semwal sumit.semwal@ti.com wrote:
+/**
- struct dma_buf - shared buffer object
- @file: file pointer used for sharing buffers across, and for
refcounting.
- @ops: dma_buf_ops associated with this buffer object
- @priv: user specific private data
- */
+struct dma_buf {
struct file *file;
struct dma_buf_ops *ops;
void *priv;
+};
hmm, I wonder if we should stuff a ptr in there to store the creating driver (or device?) ptr? This way if it somehow gets passed back to the creating driver, it could realize that it is a dma_buf that it created in the first place.. and that it was safe to cast and deref the priv ptr.
Hmmm... Do you mean the platform driver (device?) ptr? That's a good idea - I will make these changes you suggested, and re-post. I'm thinking of posting the next version to the upstream mailing lists as well.
Btw, does this mean we can 'assume' that all users/allocators of dma-buf would be platform devices?
Definitely not. It can also be PCI or USB devices (tv/video capture devices for example). E.g. you capture from a USB webcam and want to pass that on to a platform scaler device.
Regards,
Hans
BR, ~me.
Everyone: if you could please spend some time for a review, it'd help me post something without obvious, stupid mistakes to the upstream mailing lists.
Thanks and best regards, ~Sumit.
BR, -R
On Thursday 25 August 2011, Sumit Semwal wrote:
Hmmm... Do you mean the platform driver (device?) ptr? That's a good idea - I will make these changes you suggested, and re-post. I'm thinking of posting the next version to the upstream mailing lists as well.
Btw, does this mean we can 'assume' that all users/allocators of dma-buf would be platform devices?
You can pass any struct device, since the pointer is only used by the driver that set the pointer it will be able to cast it back to the derived type.
Arnd
Hi Arnd,
On Aug 26, 2011 4:03 PM, "Arnd Bergmann" arnd@arndb.de wrote:
On Thursday 25 August 2011, Sumit Semwal wrote:
Hmmm... Do you mean the platform driver (device?) ptr? That's a good
idea -
I will make these changes you suggested, and re-post. I'm thinking of posting the next version to the upstream mailing lists as well.
Btw, does this mean we can 'assume' that all users/allocators of dma-buf would be platform devices?
You can pass any struct device, since the pointer is only used by the driver that set the pointer it will be able to cast it back to the derived type.
I think the idea here is for the original allocator to know that it allocated a given buffer; so the struct device here is used more as an identifier.
Br, ~Sumit.
Arnd
On Friday 26 August 2011, Sumit Semwal wrote:
On Aug 26, 2011 4:03 PM, "Arnd Bergmann" arnd@arndb.de wrote:
On Thursday 25 August 2011, Sumit Semwal wrote:
Hmmm... Do you mean the platform driver (device?) ptr? That's a good
idea -
I will make these changes you suggested, and re-post. I'm thinking of posting the next version to the upstream mailing lists as well.
Btw, does this mean we can 'assume' that all users/allocators of dma-buf would be platform devices?
You can pass any struct device, since the pointer is only used by the driver that set the pointer it will be able to cast it back to the derived type.
I think the idea here is for the original allocator to know that it allocated a given buffer; so the struct device here is used more as an identifier.
Wouldn't it know that implicitly because the function that it's in was called from a pointer in the buffer structure that can only be set by the driver that created that buffer?
I don't think there is a value in identifying the driver at all, what you need is an identification of the device. Of course when you have the device, any code can compare the dev->drv pointer to check which driver belongs to it, but I think that's not necessary.
Arnd
On Fri, Aug 26, 2011 at 9:31 AM, Arnd Bergmann arnd@arndb.de wrote:
On Friday 26 August 2011, Sumit Semwal wrote:
On Aug 26, 2011 4:03 PM, "Arnd Bergmann" arnd@arndb.de wrote:
On Thursday 25 August 2011, Sumit Semwal wrote:
Hmmm... Do you mean the platform driver (device?) ptr? That's a good
idea -
I will make these changes you suggested, and re-post. I'm thinking of posting the next version to the upstream mailing lists as well.
Btw, does this mean we can 'assume' that all users/allocators of dma-buf would be platform devices?
You can pass any struct device, since the pointer is only used by the driver that set the pointer it will be able to cast it back to the derived type.
I think the idea here is for the original allocator to know that it allocated a given buffer; so the struct device here is used more as an identifier.
Wouldn't it know that implicitly because the function that it's in was called from a pointer in the buffer structure that can only be set by the driver that created that buffer?
yeah, you're right.. brain-fart on my part, I can just check 'if (buf->ops == &my_ops) ...'
I don't think there is a value in identifying the driver at all, what you need is an identification of the device. Of course when you have the device, any code can compare the dev->drv pointer to check which driver belongs to it, but I think that's not necessary.
I could go either way here, either 'if (buf->ops == &my_ops && ((struct my_priv)buf->priv)->dev == my_dev)' or 'if (buf->dev == my_dev)'.. the latter is more concise. I guess it depends on how common you expect to have multiple instances of same driver
BR, -R
Arnd
On Thu, Aug 25, 2011 at 12:22 AM, Sumit Semwal sumit.semwal@linaro.org wrote:
Hi Rob, On 25 August 2011 06:33, Clark, Rob rob@ti.com wrote:
On Wed, Aug 10, 2011 at 7:11 AM, Sumit Semwal sumit.semwal@ti.com wrote:
+/**
- struct dma_buf - shared buffer object
- @file: file pointer used for sharing buffers across, and for
refcounting.
- @ops: dma_buf_ops associated with this buffer object
- @priv: user specific private data
- */
+struct dma_buf {
- struct file *file;
- struct dma_buf_ops *ops;
- void *priv;
+};
hmm, I wonder if we should stuff a ptr in there to store the creating driver (or device?) ptr? This way if it somehow gets passed back to the creating driver, it could realize that it is a dma_buf that it created in the first place.. and that it was safe to cast and deref the priv ptr.
Hmmm... Do you mean the platform driver (device?) ptr? That's a good idea - I will make these changes you suggested, and re-post. I'm thinking of posting the next version to the upstream mailing lists as well. Everyone: if you could please spend some time for a review, it'd help me post something without obvious, stupid mistakes to the upstream mailing lists.
well, don't assume that it is a platform_driver.. it should be just a plain 'driver'. Or maybe 'device'. I'm not sure about it. If you have multiple device instances with the same driver, I guess it is enough that they know that they know how to interpret the 'void *priv'.
But in thinking through how things might play out in userspace, it seems to me that it is possible to hit scenarios of passing a buffer back to the device that created it. And I guess it would be useful to have a way to avoid evil twins.
BR, -R
Thanks and best regards, ~Sumit.
BR, -R
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
--
Thanks and regards,
Sumit Semwal
Linaro Kernel Engineer - Graphics working group
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
Hi Sumit,
I have couple of questions related with the buffer object sharing:
a) Process A (decoder related) and Process B (Graphics related) are sharing the buffer objects. For some reason Process B gets killed and respawns. In this process, it will now again request for fresh set of buffers. Now are we going to release previous buffers and do a huge buffer negotiations (discussed during Linaro connect) or just pass back the existing buffer objects(fd's)? I think it doesnt make sense to stop decoding entirely (which is quite time-consuming), and hence pass back already allocated buffer objects.
b) Should the user space take care of buffer locking mechanism? Means in multi-threaded scenario, if one fd(buffer object) is given for filling to the decoder, what if another thread(which may be responsible to pass this fd to another process on eg: socket) accesses same fd? Is it good to handle that in kernel space instead?
Regards, Subash
On 08/25/2011 10:52 AM, Sumit Semwal wrote:
Hi Rob,
On 25 August 2011 06:33, Clark, Rob <rob@ti.com mailto:rob@ti.com> wrote:
On Wed, Aug 10, 2011 at 7:11 AM, Sumit Semwal <sumit.semwal@ti.com <mailto:sumit.semwal@ti.com>> wrote: > > +/** > + * struct dma_buf - shared buffer object > + * @file: file pointer used for sharing buffers across, and for refcounting. > + * @ops: dma_buf_ops associated with this buffer object > + * @priv: user specific private data > + */ > +struct dma_buf { > + struct file *file; > + struct dma_buf_ops *ops; > + void *priv; > +}; > + hmm, I wonder if we should stuff a ptr in there to store the creating driver (or device?) ptr? This way if it somehow gets passed back to the creating driver, it could realize that it is a dma_buf that it created in the first place.. and that it was safe to cast and deref the priv ptr.
Hmmm... Do you mean the platform driver (device?) ptr? That's a good idea - I will make these changes you suggested, and re-post. I'm thinking of posting the next version to the upstream mailing lists as well.
Everyone: if you could please spend some time for a review, it'd help me post something without obvious, stupid mistakes to the upstream mailing lists.
Thanks and best regards, ~Sumit.
BR, -R _______________________________________________ Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org <mailto:Linaro-mm-sig@lists.linaro.org> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
--
Thanks and regards,
Sumit Semwal
Linaro Kernel Engineer - Graphics working group
Linaro.org http://www.linaro.org/***│ *Open source software for ARM SoCs____
Follow *Linaro: *Facebook http://www.facebook.com/pages/Linaro | Twitter http://twitter.com/#!/linaroorg | Blog http://www.linaro.org/linaro-blog/
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Hi Subash,
On 30 August 2011 11:12, Subash Patel subashrp@gmail.com wrote:
Hi Sumit,
I have couple of questions related with the buffer object sharing:
a) Process A (decoder related) and Process B (Graphics related) are sharing the buffer objects. For some reason Process B gets killed and respawns. In this process, it will now again request for fresh set of buffers. Now are we going to release previous buffers and do a huge buffer negotiations (discussed during Linaro connect) or just pass back the existing buffer objects(fd's)? I think it doesnt make sense to stop decoding entirely (which is quite time-consuming), and hence pass back already allocated buffer objects.
Buffer sharing mechanism, is not the same as allocator mechanism - obviously. The allocator kernel driver would need to decide this I guess; and the 'get_scatterlist()' / 'put_scatterlist()' callback APIs allow such control to the allocator. The decision can either happen within the allocator driver, or be taken by the associated user-space framework - buffer sharing framework should not pose any restrictions on this I feel. This decision should lie outside of the sharing framework.
b) Should the user space take care of buffer locking mechanism? Means in multi-threaded scenario, if one fd(buffer object) is given for filling to the decoder, what if another thread(which may be responsible to pass this fd to another process on eg: socket) accesses same fd? Is it good to handle that in kernel space instead?
Buffer-locking, and syncing should be the responsibility of the allocating / exporting driver - again, get_scatterlist() / put_scatterlist() will allow to somewhat 'control' access [read Rob's suggestion in one of his previous replies about this].
Regards, Subash
Best regards,
~Sumit.
On 08/25/2011 10:52 AM, Sumit Semwal wrote:
Hi Rob,
On 25 August 2011 06:33, Clark, Rob <rob@ti.com mailto:rob@ti.com> wrote:
On Wed, Aug 10, 2011 at 7:11 AM, Sumit Semwal <sumit.semwal@ti.com mailto:sumit.semwal@ti.com> wrote: > > +/** > + * struct dma_buf - shared buffer object > + * @file: file pointer used for sharing buffers across, and for refcounting. > + * @ops: dma_buf_ops associated with this buffer object > + * @priv: user specific private data > + */ > +struct dma_buf { > + struct file *file; > + struct dma_buf_ops *ops; > + void *priv; > +}; > +
hmm, I wonder if we should stuff a ptr in there to store the creating driver (or device?) ptr? This way if it somehow gets passed back to the creating driver, it could realize that it is a dma_buf that it created in the first place.. and that it was safe to cast and deref the priv ptr.
Hmmm... Do you mean the platform driver (device?) ptr? That's a good idea - I will make these changes you suggested, and re-post. I'm thinking of posting the next version to the upstream mailing lists as well.
Everyone: if you could please spend some time for a review, it'd help me post something without obvious, stupid mistakes to the upstream mailing lists.
Thanks and best regards, ~Sumit.
BR, -R
______________________________**_________________ Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org <mailto:Linaro-mm-sig@lists.** linaro.org Linaro-mm-sig@lists.linaro.org>
http://lists.linaro.org/**mailman/listinfo/linaro-mm-sighttp://lists.linaro.org/mailman/listinfo/linaro-mm-sig
--
Thanks and regards,
Sumit Semwal
Linaro Kernel Engineer - Graphics working group
Linaro.org http://www.linaro.org/***│ *Open source software for ARM SoCs____
Follow *Linaro: *Facebook <http://www.facebook.com/**pages/Linarohttp://www.facebook.com/pages/Linaro> | Twitter <http://twitter.com/#%21/**linaroorghttp://twitter.com/#!/linaroorg> | Blog <http://www.linaro.org/linaro-**blog/http://www.linaro.org/linaro-blog/
______________________________**_________________ Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/**mailman/listinfo/linaro-mm-sighttp://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Wednesday, August 10, 2011 14:11:04 Sumit Semwal wrote:
This RFC is aimed at introducing the buffer sharing framework for review.
Since almost all the discussion about buffer sharing objects happened on linaro-mm-sig list, I am sending it first for review within the list, and will share it with other lists after first-set of review comments from here.
--
This is the first step in defining a buffer sharing framework. A new buffer object is added, with hooks to allow for easy sharing of this buffer object across devices.
The idea was first mooted at the Linaro memory management mini-summit in Budapest in May 2011, as part of multiple things needed for a 'unified memory management framework'. It took a more concrete shape at Linaro memory-management mini-summit in Cambridge, Aug 2011.
The framework allows:
- a new buffer-object to be created, which associates a file pointer with each user-buffer and associated allocator-defined operations on that buffer. This operation is called the 'export' operation.
- this exported buffer-object to be shared with the other entity by asking for its 'file-descriptor (fd)', and sharing the fd across.
- a received fd to get the buffer object back, where it can be accessed using the associated allocator-defined operations.
- the exporter and importer to share the scatterlist using get_ and put_ operations.
Some file operations are provided as wrappers over the allocator-defined operations, which allows usage of fops(eg mmap) on the associated 'fd'.
This is based on design suggestions from many people at both the mini-summits, most notably from Arnd Bergmann arnd@arndb.de, and Rob Clark rob@ti.com.
The implementation is inspired from proof-of-concept patch-set from Tomasz Stanislawski t.stanislaws@samsung.com, who demonstrated buffer sharing between two v4l2 devices.
Signed-off-by: Sumit Semwal sumit.semwal@ti.com
drivers/base/Kconfig | 10 +++ drivers/base/Makefile | 1 + drivers/base/dma-buf.c | 196 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 105 +++++++++++++++++++++++++ 4 files changed, 312 insertions(+), 0 deletions(-) create mode 100644 drivers/base/dma-buf.c create mode 100644 include/linux/dma-buf.h
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index d57e8d0..5398ce8 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -168,4 +168,14 @@ config SYS_HYPERVISOR bool default n +config DMA_SHARED_BUFFER
- bool "Buffer framework to be shared between drivers"
- default n
- depends on ANON_INODES
- help
This option enables the framework for buffer-sharing between
multiple drivers. A buffer is associated with a file using driver
APIs extension; the file's descriptor can then be passed on to other
driver.
endmenu diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 4c5701c..bd95732 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER) += firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c new file mode 100644 index 0000000..e35b385 --- /dev/null +++ b/drivers/base/dma-buf.c @@ -0,0 +1,196 @@ +/*
- Framework for buffer objects that can be shared across devices/subsystems.
- Copyright(C) 2011 Texas Instruments Inc. All rights reserved.
- Author: Sumit Semwal sumit.semwal@ti.com
- Many thanks to linaro-mm-sig list, and specially
- Arnd Bergmann arnd@arndb.de and Rob Clark rob@ti.com
- for their support in creation and refining of this idea.
- This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License version 2 as published by
- the Free Software Foundation.
- This program is distributed in the hope that it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
- You should have received a copy of the GNU General Public License along with
- this program. If not, see http://www.gnu.org/licenses/.
- */
+#include <linux/fs.h> +#include <linux/slab.h> +#include <linux/dma-buf.h> +#include <linux/anon_inodes.h>
+static int is_dma_buf_file(struct file *);
+/* file operation wrappers for dma buf ops */ +static ssize_t dma_buf_read(struct file *file, char __user *buf, size_t size,
loff_t *offset)
+{
- struct dma_buf *dmabuf;
- if (!is_dma_buf_file(file))
return -EINVAL;
- dmabuf = file->private_data;
- return dmabuf->ops->read(dmabuf, buf, size);
The 'read' ops should be optional. Not all drivers will support this.
+}
+static ssize_t dma_buf_write(struct file *file, char __user *buf, size_t size,
loff_t *offset)
+{
- struct dma_buf *dmabuf;
- if (!is_dma_buf_file(file))
return -EINVAL;
- dmabuf = file->private_data;
- return dmabuf->ops->write(dmabuf, buf, size);
Ditto for write...
+}
+static int dma_buf_mmap(struct file *file, struct vm_area_struct *vma) +{
- struct dma_buf *dmabuf;
- if (!is_dma_buf_file(file))
return -EINVAL;
- dmabuf = file->private_data;
- return dmabuf->ops->mmap(dmabuf, vma);
...and mmap.
I would expect to see an ioctl op as well, if only to pass the size of the buffer back to userspace.
And if we support read/write, then we need a seek as well.
Frankly, I'm not sure about the read and write. I don't really see the use case. The whole point of buffer sharing is to avoid copying buffers, and that's exactly what read and write do.
+}
+static int dma_buf_release(struct inode *inode, struct file *file) +{
- struct dma_buf *dmabuf;
- if (!is_dma_buf_file(file))
return -EINVAL;
- dmabuf = file->private_data;
- dmabuf->ops->release(dmabuf);
- kfree(dmabuf);
- return 0;
+}
+static const struct file_operations dma_buf_fops = {
- .mmap = dma_buf_mmap,
- .read = dma_buf_read,
- .write = dma_buf_write,
- .release = dma_buf_release,
+};
+/*
- is_dma_buf_file - Check if struct file* is associated with dma_buf
- */
+static inline int is_dma_buf_file(struct file *file) +{
- return file->f_op == &dma_buf_fops;
+}
+/**
- dma_buf_export - Creates a new dma_buf, and associates an anon file
- with this buffer,so it can be exported.
- Also 'attach' the allocator specific data and ops to the buffer.
- @priv: Attach private data of allocator to this buffer
- @ops: Attach allocator-defined dma buf ops to the new buffer.
- Returns, on success, a newly created dma_buf object, which wraps the
- supplied private data and operations for dma_buf_ops. On failure to
- allocate the dma_buf object, it can return NULL.
- */
+struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops) +{
- struct dma_buf *dmabuf;
- struct file *file;
- BUG_ON(!priv || !ops);
- dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL);
- if (dmabuf == NULL)
return dmabuf;
- dmabuf->priv = priv;
- dmabuf->ops = ops;
- file = anon_inode_getfile("dmabuf", &dma_buf_fops, dmabuf, 0);
- dmabuf->file = file;
- file->private_data = dmabuf;
- return dmabuf;
+} +EXPORT_SYMBOL(dma_buf_export);
+/**
- dma_buf_fd - returns a file descriptor for the given dma_buf
- @dmabuf: [in] pointer to dma_buf for which fd is required.
- On success, returns an associated 'fd'. Else, returns error.
- */
+int dma_buf_fd(struct dma_buf *dmabuf) +{
- int error, fd;
- if (!dmabuf->file)
return -EINVAL;
- error = get_unused_fd_flags(0);
- if (error < 0)
return error;
- fd = error;
- fd_install(fd, dmabuf->file);
- return fd;
+} +EXPORT_SYMBOL(dma_buf_fd);
+/**
- dma_buf_get - returns the dma_buf structure related to an fd
- @fd: [in] fd associated with the dma_buf to be returned
- On success, returns the dma_buf structure associated with an fd; uses
- file's refcounting done by fget to increase refcount. returns ERR_PTR
- otherwise.
- */
+struct dma_buf *dma_buf_get(int fd) +{
- struct file *file;
- file = fget(fd);
- if (!file)
return ERR_PTR(-EBADF);
- if (!is_dma_buf_file(file)) {
fput(file);
return ERR_PTR(-EINVAL);
- }
- return file->private_data;
+} +EXPORT_SYMBOL(dma_buf_get);
+/**
- dma_buf_put - decreases refcount of the buffer
- @dmabuf: [in] buffer to reduce refcount of
- Uses file's refcounting done implicitly by fput()
- */
+void dma_buf_put(struct dma_buf *dmabuf) +{
- BUG_ON(!dmabuf->file);
- fput(dmabuf->file);
- return;
+} +EXPORT_SYMBOL(dma_buf_put); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h new file mode 100644 index 0000000..c22896a --- /dev/null +++ b/include/linux/dma-buf.h @@ -0,0 +1,105 @@ +/*
- Header file for dma buffer sharing framework.
- Copyright(C) 2011 Texas Instruments Inc. All rights reserved.
- Author: Sumit Semwal sumit.semwal@ti.com
- Many thanks to linaro-mm-sig list, and specially
- Arnd Bergmann arnd@arndb.de and Rob Clark rob@ti.com
- for their support in creation and refining of this idea.
- This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License version 2 as published by
- the Free Software Foundation.
- This program is distributed in the hope that it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
- You should have received a copy of the GNU General Public License along with
- this program. If not, see http://www.gnu.org/licenses/.
- */
+#ifndef __DMA_BUF_H__ +#define __DMA_BUF_H__
+#include <linux/file.h> +#include <linux/err.h> +#include <linux/device.h> +#include <linux/scatterlist.h>
+struct dma_buf;
+/**
- struct dma_buf_ops - operations possible on struct dmabuf
- @get_scatterlist: returns list of scatter pages allocated, increases
usecount of the buffer
- @put_scatterlist: decreases usecount of buffer, might deallocate scatter
pages
- @mmap: map this buffer
- @read: read from this buffer
- @write: write to this buffer
- @release: release this buffer; to be called after the last dma_buf_put
- @sync_sg_for_cpu: sync the sg list for cpu
- @sync_sg_for_device: synch the sg list for device
- */
+struct dma_buf_ops {
- /* allow buffer to not be pinned when DMA is not happening */
- struct scatterlist * (*get_scatterlist)(struct dma_buf *);
- void (*put_scatterlist)(struct dma_buf *, struct scatterlist *);
- /* allow allocator to mmap/read/write to take care of cache attrib */
- int (*mmap)(struct dma_buf *, struct vm_area_struct *);
- ssize_t (*read)(struct dma_buf *, void *, size_t);
- ssize_t (*write)(struct dma_buf *, void *, size_t);
- /* after final dma_buf_put() */
- void (*release)(struct dma_buf *);
- /* allow allocator to take care of cache ops */
- void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
- void (*sync_sg_for_device)(struct dma_buf *, struct device *);
+};
+/**
- struct dma_buf - shared buffer object
- @file: file pointer used for sharing buffers across, and for refcounting.
- @ops: dma_buf_ops associated with this buffer object
- @priv: user specific private data
- */
+struct dma_buf {
- struct file *file;
- struct dma_buf_ops *ops;
- void *priv;
+};
+#ifdef CONFIG_DMA_SHARED_BUFFER
+struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops); +int dma_buf_fd(struct dma_buf *dmabuf); +struct dma_buf *dma_buf_get(int fd); +void dma_buf_put(struct dma_buf *dmabuf);
+#else +static inline struct dma_buf *dma_buf_export(void *priv,
struct dma_buf_ops *ops)
+{
- return ERR_PTR(-ENODEV);
+}
+static inline int dma_buf_fd(struct dma_buf *dmabuf) +{
- return -ENODEV;
+}
+static inline struct dma_buf *dma_buf_get(int fd) +{
- return ERR_PTR(-ENODEV);
+}
+static inline void dma_buf_put(struct dma_buf *dmabuf) +{
- return;
+} +#endif /* CONFIG_DMA_SHARED_BUFFER */
+#endif /* __DMA_BUF_H__ */
Regards,
Hans
On Fri, Aug 26, 2011 at 4:06 AM, Hans Verkuil hverkuil@xs4all.nl wrote:
Frankly, I'm not sure about the read and write. I don't really see the use case. The whole point of buffer sharing is to avoid copying buffers, and that's exactly what read and write do.
The main point (and maybe there is no point) for read/write is an alternative interface for sw access to buffer.. it isn't really intended for performant use cases, but rather for edge-cases. Like when we want to grab one frame of video and use sw to generate a thumbnail.
Possibly mmap interface would be fine. But read/write seems attractive because you could do an easier job of hiding some weird formats. (Which I suppose you could still do w/ mmap and fault handling games.)
I don't have any immediate plans for such use (mmap would suffice), but it did seem good to have all the normal file ops supported. If someone passed me a fd in userspace, I would expect to be able to read() and write(), and would be a bit surprised if that wasn't supported. So I guess principle-of-least-surpise applies here.
BR, -R
On 08/26/2011 10:28 AM, Clark, Rob wrote:
On Fri, Aug 26, 2011 at 4:06 AM, Hans Verkuilhverkuil@xs4all.nl wrote:
Frankly, I'm not sure about the read and write. I don't really see the use case. The whole point of buffer sharing is to avoid copying buffers, and that's exactly what read and write do.
The main point (and maybe there is no point) for read/write is an alternative interface for sw access to buffer.. it isn't really intended for performant use cases, but rather for edge-cases. Like when we want to grab one frame of video and use sw to generate a thumbnail.
Possibly mmap interface would be fine. But read/write seems attractive because you could do an easier job of hiding some weird formats. (Which I suppose you could still do w/ mmap and fault handling games.)
I don't have any immediate plans for such use (mmap would suffice), but it did seem good to have all the normal file ops supported. If someone passed me a fd in userspace, I would expect to be able to read() and write(), and would be a bit surprised if that wasn't supported. So I guess principle-of-least-surpise applies here.
In that same line of thought you should implement lseek too. I wouldn't find it odd to not have read/write available, but that might be shaded by my prior DRM experience.
Personally, if I was to use a read/write interface I would want something close to the PWRITE ioctl that the i915 DRM driver uses, but I agree that in any event read/write will be a lesser used path, except possibly in the case of SW rendering (and I _hope_ that is a lesser used path).
Jordan
On Fri, Aug 26, 2011 at 12:30 PM, Jordan Crouse jcrouse@quicinc.com wrote:
On 08/26/2011 10:28 AM, Clark, Rob wrote:
On Fri, Aug 26, 2011 at 4:06 AM, Hans Verkuilhverkuil@xs4all.nl wrote:
Frankly, I'm not sure about the read and write. I don't really see the use case. The whole point of buffer sharing is to avoid copying buffers, and that's exactly what read and write do.
The main point (and maybe there is no point) for read/write is an alternative interface for sw access to buffer.. it isn't really intended for performant use cases, but rather for edge-cases. Like when we want to grab one frame of video and use sw to generate a thumbnail.
Possibly mmap interface would be fine. But read/write seems attractive because you could do an easier job of hiding some weird formats. (Which I suppose you could still do w/ mmap and fault handling games.)
I don't have any immediate plans for such use (mmap would suffice), but it did seem good to have all the normal file ops supported. If someone passed me a fd in userspace, I would expect to be able to read() and write(), and would be a bit surprised if that wasn't supported. So I guess principle-of-least-surpise applies here.
In that same line of thought you should implement lseek too. I wouldn't find it odd to not have read/write available, but that might be shaded by my prior DRM experience.
yeah, I assume lseek (if we have read/write).. well default_llseek would be enough for any scenario I could think of, but I guess we might as well let the exporting driver provide default_llseek if that is what it wants
re: DRM.. well, I assume that is because you don't have a fd for GEM objects..
anyways, "normal" fileops aren't a must-have.. the use-cases of sw access that I could think of would probably prefer mmap. There shouldn't be any issue to add them later if someone came up for a good reason to have 'em.
BR, -R
Personally, if I was to use a read/write interface I would want something close to the PWRITE ioctl that the i915 DRM driver uses, but I agree that in any event read/write will be a lesser used path, except possibly in the case of SW rendering (and I _hope_ that is a lesser used path).
Jordan
Note that read/write can be given semantics that mmap + memory-mapped access doesn't provide. (Block until you get an I frame, then capture a scaled-down picture with no tearing artifacts, all in a single read() syscall ...)
On 08/27/2011 03:06 AM, Michael K. Edwards wrote:
Note that read/write can be given semantics that mmap + memory-mapped access doesn't provide. (Block until you get an I frame, then capture a scaled-down picture with no tearing artifacts, all in a single read() syscall ...)
Your use case of blocking on read() can be helpful only for a movie thumbnail in gallery. Otherwise, blocking on read(fd) in a userspace can result in frame loss during the encode kind of operation.
Regards, Subash
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Mon, Aug 29, 2011 at 10:16 PM, Subash Patel subashrp@gmail.com wrote:
Your use case of blocking on read() can be helpful only for a movie thumbnail in gallery. Otherwise, blocking on read(fd) in a userspace can result in frame loss during the encode kind of operation.
Yes, that's exactly the kind of use case I had in mind. Similarly, frame-grab from userland on a pipeline whose continuous flow is in-kernel. The argument is not that you shouldn't have mmap(), it's that you should also have read(), because there are use cases for it.
- M
On 26 August 2011 23:39, Clark, Rob rob@ti.com wrote:
On Fri, Aug 26, 2011 at 12:30 PM, Jordan Crouse jcrouse@quicinc.com wrote:
On 08/26/2011 10:28 AM, Clark, Rob wrote:
On Fri, Aug 26, 2011 at 4:06 AM, Hans Verkuilhverkuil@xs4all.nl
wrote:
Frankly, I'm not sure about the read and write. I don't really see the use case. The whole point of buffer sharing is to avoid copying buffers,
and
that's exactly what read and write do.
The main point (and maybe there is no point) for read/write is an alternative interface for sw access to buffer.. it isn't really intended for performant use cases, but rather for edge-cases. Like when we want to grab one frame of video and use sw to generate a thumbnail.
Possibly mmap interface would be fine. But read/write seems attractive because you could do an easier job of hiding some weird formats. (Which I suppose you could still do w/ mmap and fault handling games.)
I don't have any immediate plans for such use (mmap would suffice), but it did seem good to have all the normal file ops supported. If someone passed me a fd in userspace, I would expect to be able to read() and write(), and would be a bit surprised if that wasn't supported. So I guess principle-of-least-surpise applies here.
In that same line of thought you should implement lseek too. I wouldn't find it odd to not have read/write available, but that might be shaded by my prior DRM experience.
yeah, I assume lseek (if we have read/write).. well default_llseek would be enough for any scenario I could think of, but I guess we might as well let the exporting driver provide default_llseek if that is what it wants
re: DRM.. well, I assume that is because you don't have a fd for GEM objects..
anyways, "normal" fileops aren't a must-have.. the use-cases of sw access that I could think of would probably prefer mmap. There shouldn't be any issue to add them later if someone came up for a good reason to have 'em.
Ok - so to consolidate feedback: I would remove off read-write() calls [from both dma-buf-ops and fileops], leave mmap() in. If during the course of development, like Rob said, should anyone need a reason for them to be in, we can add them back.
BR, ~Sumit.
BR, -R
Personally, if I was to use a read/write interface I would want something close to the PWRITE ioctl that the i915 DRM driver uses, but I agree that in any event read/write will be a lesser used path, except possibly in the case of SW rendering (and I _hope_ that is a lesser used path).
Jordan
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
linaro-mm-sig@lists.linaro.org