[RFCv2 PATCH 0/9] Integration of videobuf2 with dmabuf

List overview All Threads
Download

newer

older

2012疯狂销售▲团队建设

[PULL REQUEST] DMA-mapping...

Tomasz Stanislawski

13 Mar 2012 13 Mar '12

10:16 a.m.

Hello everyone, This patchset is an incremental patch to patchset created by Sumit Semwal [1]. The patches are dedicated to help find a better solution for support of buffer sharing by V4L2 API. It is expected to start discussion on the final installment for dma-buf in vb2-dma-contig allocator. Current version of the patches contain little documentation. It is going to be fixed after achieving consensus about design for buffer exporting. Moreover the API between vb2-core and the allocator should be revised.

The patches were successfully tested to cooperate with EXYNOS DRM driver using DMABUF mechanism.

Please note, that the amount of changes to vb2-dma-contig.c was significant making the difference patch very difficult to read.

The patchset makes use of dma_get_pages extension for DMA API, which is posted on a top of dma-mapping patches by Marek Szyprowski [4] [5].

The tree, that contains all needed patches, can be found here [6].

v2: - extended VIDIOC_EXPBUF argument from integer memoffset to struct v4l2_exportbuffer - added patch that breaks DMABUF spec on (un)map_atachment callcacks but allows to work with existing implementation of DMABUF prime in DRM - all dma-contig code refactoring patches were squashed - bugfixes

v1: List of changes since [1]. - support for DMA api extension dma_get_pages, the function is used to retrieve pages used to create DMA mapping. - small fixes/code cleanup to videobuf2 - added prepare and finish callbacks to vb2 allocators, it is used keep consistency between dma-cpu acess to the memory (by Marek Szyprowski) - support for exporting of DMABUF buffer in V4L2 and Videobuf2, originated from [3]. - support for dma-buf exporting in vb2-dma-contig allocator - support for DMABUF for s5p-tv and s5p-fimc (capture interface) drivers, originated from [3] - changed handling for userptr buffers (by Marek Szyprowski, Andrzej Pietrasiewicz) - let mmap method to use dma_mmap_writecombine call (by Marek Szyprowski)

[1] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/42966... [2] https://lkml.org/lkml/2011/12/26/29 [3] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/36354... [4] http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819 [5] http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/... [6] http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/...

Sumit Semwal (1): v4l: vb2: Add dma-contig allocator as dma_buf user

Tomasz Stanislawski (8): v4l: vb2: fixes for DMABUF support v4l: vb2-dma-contig: update and code refactoring v4l: add buffer exporting via dmabuf v4l: vb2: add buffer exporting via dmabuf v4l: vb2-dma-contig: add support for DMABUF exporting v4l: vb2-dma-contig: change map/unmap behaviour v4l: fimc: integrate capture i-face with dmabuf v4l: s5p-tv: mixer: integrate with dmabuf

drivers/media/video/Kconfig | 1 + drivers/media/video/s5p-fimc/fimc-capture.c | 11 +- drivers/media/video/s5p-tv/Kconfig | 1 + drivers/media/video/s5p-tv/mixer_video.c | 12 +- drivers/media/video/v4l2-compat-ioctl32.c | 1 + drivers/media/video/v4l2-ioctl.c | 11 + drivers/media/video/videobuf2-core.c | 88 +++- drivers/media/video/videobuf2-dma-contig.c | 717 ++++++++++++++++++++++++--- include/linux/videodev2.h | 20 + include/media/v4l2-ioctl.h | 2 + include/media/videobuf2-core.h | 8 +- 11 files changed, 779 insertions(+), 93 deletions(-)

-- 1.7.5.4

Show replies by date

Tomasz Stanislawski

13 Mar 13 Mar

10:16 a.m.

New subject: [RFCv2 PATCH 1/9] v4l: vb2: fixes for DMABUF support

This patch contains fixes to DMABUF support in vb2-core. - fixes number of arguments of call_memop macro - fixes setup of plane length - fixes handling of error pointers

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- drivers/media/video/videobuf2-core.c | 24 +++++++++++------------- include/media/videobuf2-core.h | 6 +++--- 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/media/video/videobuf2-core.c b/drivers/media/video/videobuf2-core.c index 951cb56..e7df560 100644 --- a/drivers/media/video/videobuf2-core.c +++ b/drivers/media/video/videobuf2-core.c @@ -118,7 +118,7 @@ static void __vb2_buf_dmabuf_put(struct vb2_buffer *vb) void *mem_priv = vb->planes[plane].mem_priv;

if (mem_priv) { - call_memop(q, plane, detach_dmabuf, mem_priv); + call_memop(q, detach_dmabuf, mem_priv); dma_buf_put(vb->planes[plane].dbuf); vb->planes[plane].dbuf = NULL; vb->planes[plane].mem_priv = NULL; @@ -905,6 +905,8 @@ static int __fill_vb2_buffer(struct vb2_buffer *vb, const struct v4l2_buffer *b, } if (b->memory == V4L2_MEMORY_DMABUF) { for (plane = 0; plane < vb->num_planes; ++plane) { + v4l2_planes[plane].bytesused = + b->m.planes[plane].bytesused; v4l2_planes[plane].m.fd = b->m.planes[plane].m.fd; } } @@ -1052,17 +1054,13 @@ static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b) if (IS_ERR_OR_NULL(dbuf)) { dprintk(1, "qbuf: invalid dmabuf fd for " "plane %d\n", plane); - ret = PTR_ERR(dbuf); + ret = -EINVAL; goto err; }

- /* this doesn't get filled in until __fill_vb2_buffer(), - * since it isn't known until after dma_buf_get().. - */ - planes[plane].length = dbuf->size; - /* Skip the plane if already verified */ if (dbuf == vb->planes[plane].dbuf) { + planes[plane].length = dbuf->size; dma_buf_put(dbuf); continue; } @@ -1072,7 +1070,7 @@ static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b)

/* Release previously acquired memory if present */ if (vb->planes[plane].mem_priv) { - call_memop(q, plane, detach_dmabuf, + call_memop(q, detach_dmabuf, vb->planes[plane].mem_priv); dma_buf_put(vb->planes[plane].dbuf); } @@ -1080,8 +1078,8 @@ static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b) vb->planes[plane].mem_priv = NULL;

/* Acquire each plane's memory */ - mem_priv = q->mem_ops->attach_dmabuf( - q->alloc_ctx[plane], dbuf); + mem_priv = call_memop(q, attach_dmabuf, q->alloc_ctx[plane], + dbuf, q->plane_sizes[plane], write); if (IS_ERR(mem_priv)) { dprintk(1, "qbuf: failed acquiring dmabuf " "memory for plane %d\n", plane); @@ -1089,6 +1087,7 @@ static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b) goto err; }

+ planes[plane].length = dbuf->size; vb->planes[plane].dbuf = dbuf; vb->planes[plane].mem_priv = mem_priv; } @@ -1098,8 +1097,7 @@ static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b) * the buffer(s).. */ for (plane = 0; plane < vb->num_planes; ++plane) { - ret = q->mem_ops->map_dmabuf( - vb->planes[plane].mem_priv, write); + ret = call_memop(q, map_dmabuf, vb->planes[plane].mem_priv); if (ret) { dprintk(1, "qbuf: failed mapping dmabuf " "memory for plane %d\n", plane); @@ -1527,7 +1525,7 @@ int vb2_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool nonblocking) */ if (q->memory == V4L2_MEMORY_DMABUF) for (plane = 0; plane < vb->num_planes; ++plane) - call_memop(q, plane, unmap_dmabuf, + call_memop(q, unmap_dmabuf, vb->planes[plane].mem_priv);

switch (vb->state) { diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h index d8b8171..412c6a4 100644 --- a/include/media/videobuf2-core.h +++ b/include/media/videobuf2-core.h @@ -88,10 +88,10 @@ struct vb2_mem_ops { * in the vb2 core, and vb2_mem_ops really just need to get/put the * sglist (and make sure that the sglist fits it's needs..) */ - void *(*attach_dmabuf)(void *alloc_ctx, - struct dma_buf *dbuf); + void *(*attach_dmabuf)(void *alloc_ctx, struct dma_buf *dbuf, + unsigned long size, int write); void (*detach_dmabuf)(void *buf_priv); - int (*map_dmabuf)(void *buf_priv, int write); + int (*map_dmabuf)(void *buf_priv); void (*unmap_dmabuf)(void *buf_priv);

void *(*vaddr)(void *buf_priv);

-- 1.7.5.4

Laurent Pinchart

22 Mar 22 Mar

7:56 a.m.

New subject: [RFCv2 PATCH 1/9] v4l: vb2: fixes for DMABUF support

Hi Tomasz,

Thanks for the patch.

On Tuesday 13 March 2012 11:16:59 Tomasz Stanislawski wrote:

...

This patch contains fixes to DMABUF support in vb2-core.

fixes number of arguments of call_memop macro

fixes setup of plane length

fixes handling of error pointers

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com

Tested-by: Laurent Pinchart laurent.pinchart@ideasonboard.com

I suppose this will be squashed with Sumit's "[RFCv1 2/4] v4l:vb2: add support for shared buffer (dma_buf)" patch.

...

drivers/media/video/videobuf2-core.c | 24 +++++++++++------------- include/media/videobuf2-core.h | 6 +++--- 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/media/video/videobuf2-core.c b/drivers/media/video/videobuf2-core.c index 951cb56..e7df560 100644 --- a/drivers/media/video/videobuf2-core.c +++ b/drivers/media/video/videobuf2-core.c @@ -118,7 +118,7 @@ static void __vb2_buf_dmabuf_put(struct vb2_buffer *vb) void *mem_priv = vb->planes[plane].mem_priv;
if (mem_priv) {
	call_memop(q, plane, detach_dmabuf, mem_priv);
	call_memop(q, detach_dmabuf, mem_priv);
dma_buf_put(vb->planes[plane].dbuf);
vb->planes[plane].dbuf = NULL;
vb->planes[plane].mem_priv = NULL;
@@ -905,6 +905,8 @@ static int __fill_vb2_buffer(struct vb2_buffer *vb, const struct v4l2_buffer *b, } if (b->memory == V4L2_MEMORY_DMABUF) { for (plane = 0; plane < vb->num_planes; ++plane) {
		v4l2_planes[plane].bytesused =
			b->m.planes[plane].bytesused;
	v4l2_planes[plane].m.fd = b->m.planes[plane].m.fd;
}
}
@@ -1052,17 +1054,13 @@ static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b) if (IS_ERR_OR_NULL(dbuf)) { dprintk(1, "qbuf: invalid dmabuf fd for " "plane %d\n", plane);
	ret = PTR_ERR(dbuf);
	ret = -EINVAL;
goto err;
}
/* this doesn't get filled in until __fill_vb2_buffer(),
 * since it isn't known until after dma_buf_get()..
 */
planes[plane].length = dbuf->size;
/* Skip the plane if already verified */ if (dbuf == vb->planes[plane].dbuf) {
	planes[plane].length = dbuf->size;
dma_buf_put(dbuf);
continue;
}
@@ -1072,7 +1070,7 @@ static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b)
/* Release previously acquired memory if present */
if (vb->planes[plane].mem_priv) {
	call_memop(q, plane, detach_dmabuf,
	call_memop(q, detach_dmabuf,
	vb->planes[plane].mem_priv);
dma_buf_put(vb->planes[plane].dbuf);
}
@@ -1080,8 +1078,8 @@ static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b) vb->planes[plane].mem_priv = NULL;
/* Acquire each plane's memory */
mem_priv = q->mem_ops->attach_dmabuf(
		q->alloc_ctx[plane], dbuf);
mem_priv = call_memop(q, attach_dmabuf, q->alloc_ctx[plane],
	dbuf, q->plane_sizes[plane], write);
if (IS_ERR(mem_priv)) { dprintk(1, "qbuf: failed acquiring dmabuf " "memory for plane %d\n", plane);
@@ -1089,6 +1087,7 @@ static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b) goto err; }
planes[plane].length = dbuf->size;
vb->planes[plane].dbuf = dbuf; vb->planes[plane].mem_priv = mem_priv; }
@@ -1098,8 +1097,7 @@ static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b) * the buffer(s).. */ for (plane = 0; plane < vb->num_planes; ++plane) {
ret = q->mem_ops->map_dmabuf(
		vb->planes[plane].mem_priv, write);
ret = call_memop(q, map_dmabuf, vb->planes[plane].mem_priv);
if (ret) { dprintk(1, "qbuf: failed mapping dmabuf " "memory for plane %d\n", plane);
@@ -1527,7 +1525,7 @@ int vb2_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool nonblocking) */ if (q->memory == V4L2_MEMORY_DMABUF) for (plane = 0; plane < vb->num_planes; ++plane)
	call_memop(q, plane, unmap_dmabuf,
	call_memop(q, unmap_dmabuf,
	vb->planes[plane].mem_priv);
switch (vb->state) {
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h index d8b8171..412c6a4 100644 --- a/include/media/videobuf2-core.h +++ b/include/media/videobuf2-core.h @@ -88,10 +88,10 @@ struct vb2_mem_ops { * in the vb2 core, and vb2_mem_ops really just need to get/put the * sglist (and make sure that the sglist fits it's needs..) */
void *(*attach_dmabuf)(void *alloc_ctx,
			  struct dma_buf *dbuf);
void *(*attach_dmabuf)(void *alloc_ctx, struct dma_buf *dbuf,
		unsigned long size, int write);
void (*detach_dmabuf)(void *buf_priv);
int (*map_dmabuf)(void *buf_priv, int write);

int (*map_dmabuf)(void *buf_priv); void (*unmap_dmabuf)(void *buf_priv);

void *(*vaddr)(void *buf_priv);

-- Regards, Laurent Pinchart

Tomasz Stanislawski

13 Mar 13 Mar

10:17 a.m.

New subject: [RFCv2 PATCH 2/9] v4l: vb2-dma-contig: update and code refactoring

This patch combines updates and fixes to dma-contig allocator. Moreover the allocator code was refactored. The most important changes are: - functions were reordered - move compression of scatterlist to separete function - add support for multichunk but contiguous scatterlists - simplified implementation of vb2-dma-contig context structure - let mmap method to use dma_mmap_writecombine - add support for scatterlist in userptr mode

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com [mmap method] Signed-off-by: Andrzej Pietrasiewicz andrzej.p@samsung.com [scatterlist in userptr mode] Signed-off-by: Kamil Debski k.debski@samsung.com [bugfixing] Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com [core refactoring, helper functions] Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- drivers/media/video/videobuf2-dma-contig.c | 495 +++++++++++++++++++++++----- 1 files changed, 414 insertions(+), 81 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index f17ad98..c1dc043 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -10,173 +10,506 @@ * the Free Software Foundation. */

+#include <linux/dma-buf.h> +#include <linux/dma-mapping.h> #include <linux/module.h> +#include <linux/scatterlist.h> +#include <linux/sched.h> #include <linux/slab.h> -#include <linux/dma-mapping.h>

#include <media/videobuf2-core.h> #include <media/videobuf2-memops.h>

-struct vb2_dc_conf { - struct device *dev; -}; - struct vb2_dc_buf { - struct vb2_dc_conf *conf; + struct device *dev; void *vaddr; - dma_addr_t dma_addr; unsigned long size; - struct vm_area_struct *vma; - atomic_t refcount; + dma_addr_t dma_addr; + struct sg_table *dma_sgt; + enum dma_data_direction dma_dir; + + /* MMAP related */ struct vb2_vmarea_handler handler; + atomic_t refcount; + struct sg_table *sgt_base; + + /* USERPTR related */ + struct vm_area_struct *vma; };

-static void vb2_dma_contig_put(void *buf_priv); +/*********************************************/ +/* scatterlist table functions */ +/*********************************************/

-static void *vb2_dma_contig_alloc(void *alloc_ctx, unsigned long size) +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages, + unsigned long n_pages, size_t offset, size_t offset2) { - struct vb2_dc_conf *conf = alloc_ctx; - struct vb2_dc_buf *buf; + struct sg_table *sgt; + int i, j; /* loop counters */ + int cur_page, chunks; + int ret; + struct scatterlist *s;

- buf = kzalloc(sizeof *buf, GFP_KERNEL); - if (!buf) + sgt = kzalloc(sizeof *sgt, GFP_KERNEL); + if (!sgt) return ERR_PTR(-ENOMEM);

- buf->vaddr = dma_alloc_coherent(conf->dev, size, &buf->dma_addr, - GFP_KERNEL); - if (!buf->vaddr) { - dev_err(conf->dev, "dma_alloc_coherent of size %ld failed\n", - size); - kfree(buf); + /* compute number of chunks */ + chunks = 1; + for (i = 1; i < n_pages; ++i) + if (pages[i] != pages[i - 1] + 1) + ++chunks; + + ret = sg_alloc_table(sgt, chunks, GFP_KERNEL); + if (ret) { + kfree(sgt); return ERR_PTR(-ENOMEM); }

- buf->conf = conf; - buf->size = size; - - buf->handler.refcount = &buf->refcount; - buf->handler.put = vb2_dma_contig_put; - buf->handler.arg = buf; + /* merging chunks and putting them into the scatterlist */ + cur_page = 0; + for_each_sg(sgt->sgl, s, sgt->orig_nents, i) { + size_t size = PAGE_SIZE; + + for (j = cur_page + 1; j < n_pages; ++j) { + if (pages[j] != pages[j - 1] + 1) + break; + size += PAGE_SIZE; + } + + /* cut offset if chunk starts at the first page */ + if (cur_page == 0) + size -= offset; + /* cut offset2 if chunk ends at the last page */ + if (j == n_pages) + size -= offset2; + + sg_set_page(s, pages[cur_page], size, offset); + offset = 0; + cur_page = j; + }

- atomic_inc(&buf->refcount); + return sgt; +}

- return buf; +static void vb2_dc_release_sgtable(struct sg_table *sgt) +{ + sg_free_table(sgt); + kfree(sgt); }

-static void vb2_dma_contig_put(void *buf_priv) +static void vb2_dc_put_sgtable(struct sg_table *sgt, int dirty) { - struct vb2_dc_buf *buf = buf_priv; + struct scatterlist *s; + int i, j; + + for_each_sg(sgt->sgl, s, sgt->nents, i) { + struct page *page = sg_page(s); + int n_pages = PAGE_ALIGN(s->offset + s->length) >> PAGE_SHIFT; + + for (j = 0; j < n_pages; ++j, ++page) { + if (dirty) + set_page_dirty_lock(page); + put_page(page); + } + } + + vb2_dc_release_sgtable(sgt); +}

- if (atomic_dec_and_test(&buf->refcount)) { - dma_free_coherent(buf->conf->dev, buf->size, buf->vaddr, - buf->dma_addr); - kfree(buf); +static unsigned long vb2_dc_get_contiguous_size(struct sg_table *sgt) +{ + struct scatterlist *s; + dma_addr_t expected = sg_dma_address(sgt->sgl); + int i; + unsigned long size = 0; + + for_each_sg(sgt->sgl, s, sgt->nents, i) { + if (sg_dma_address(s) != expected) + break; + expected = sg_dma_address(s) + sg_dma_len(s); + size += sg_dma_len(s); } + return size; }

-static void *vb2_dma_contig_cookie(void *buf_priv) +/*********************************************/ +/* callbacks for all buffers */ +/*********************************************/ + +static void *vb2_dc_cookie(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;

return &buf->dma_addr; }

-static void *vb2_dma_contig_vaddr(void *buf_priv) +static void *vb2_dc_vaddr(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv; - if (!buf) - return 0;

return buf->vaddr; }

-static unsigned int vb2_dma_contig_num_users(void *buf_priv) +static unsigned int vb2_dc_num_users(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;

return atomic_read(&buf->refcount); }

-static int vb2_dma_contig_mmap(void *buf_priv, struct vm_area_struct *vma) +static void vb2_dc_prepare(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv; + struct sg_table *sgt = buf->dma_sgt;

- if (!buf) { - printk(KERN_ERR "No buffer to map\n"); - return -EINVAL; - } + if (!sgt) + return; + + dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); +} + +static void vb2_dc_finish(void *buf_priv) +{ + struct vb2_dc_buf *buf = buf_priv; + struct sg_table *sgt = buf->dma_sgt; + + if (!sgt) + return; + + dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); +} + +/*********************************************/ +/* callbacks for MMAP buffers */ +/*********************************************/ + +static void vb2_dc_put(void *buf_priv) +{ + struct vb2_dc_buf *buf = buf_priv;

- return vb2_mmap_pfn_range(vma, buf->dma_addr, buf->size, - &vb2_common_vm_ops, &buf->handler); + if (!atomic_dec_and_test(&buf->refcount)) + return; + + vb2_dc_release_sgtable(buf->sgt_base); + dma_free_coherent(buf->dev, buf->size, buf->vaddr, + buf->dma_addr); + kfree(buf); }

-static void *vb2_dma_contig_get_userptr(void *alloc_ctx, unsigned long vaddr, - unsigned long size, int write) +static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) { + struct device *dev = alloc_ctx; struct vb2_dc_buf *buf; - struct vm_area_struct *vma; - dma_addr_t dma_addr = 0; int ret; + int n_pages; + struct page **pages = NULL;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

- ret = vb2_get_contig_userptr(vaddr, size, &vma, &dma_addr); - if (ret) { - printk(KERN_ERR "Failed acquiring VMA for vaddr 0x%08lx\n", - vaddr); - kfree(buf); - return ERR_PTR(ret); + buf->dev = dev; + buf->size = size; + buf->vaddr = dma_alloc_coherent(buf->dev, buf->size, &buf->dma_addr, + GFP_KERNEL); + + ret = -ENOMEM; + if (!buf->vaddr) { + dev_err(dev, "dma_alloc_coherent of size %ld failed\n", + size); + goto fail_buf; }

- buf->size = size; - buf->dma_addr = dma_addr; - buf->vma = vma; + WARN_ON((unsigned long)buf->vaddr & ~PAGE_MASK); + WARN_ON(buf->dma_addr & ~PAGE_MASK); + + n_pages = PAGE_ALIGN(size) >> PAGE_SHIFT; + + pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL); + if (!pages) { + printk(KERN_ERR "failed to alloc page table\n"); + goto fail_dma; + } + + ret = dma_get_pages(dev, buf->vaddr, buf->dma_addr, pages, n_pages); + if (ret < 0) { + printk(KERN_ERR "failed to get buffer pages from DMA API\n"); + goto fail_pages; + } + if (ret != n_pages) { + ret = -EFAULT; + printk(KERN_ERR "failed to get all pages from DMA API\n"); + goto fail_pages; + } + + buf->sgt_base = vb2_dc_pages_to_sgt(pages, n_pages, 0, 0); + if (IS_ERR(buf->sgt_base)) { + ret = PTR_ERR(buf->sgt_base); + printk(KERN_ERR "failed to prepare sg table\n"); + goto fail_pages; + } + + /* pages are no longer needed */ + kfree(pages); + + buf->handler.refcount = &buf->refcount; + buf->handler.put = vb2_dc_put; + buf->handler.arg = buf; + + atomic_inc(&buf->refcount);

return buf; + +fail_pages: + kfree(pages); + +fail_dma: + dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr); + +fail_buf: + kfree(buf); + + return ERR_PTR(ret); +} + +static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma) +{ + struct vb2_dc_buf *buf = buf_priv; + int ret; + + /* + * dma_mmap_* uses vm_pgoff as in-buffer offset, but we want to + * map whole buffer + */ + vma->vm_pgoff = 0; + + ret = dma_mmap_writecombine(buf->dev, vma, buf->vaddr, + buf->dma_addr, buf->size); + + if (ret) { + printk(KERN_ERR "Remapping memory failed, error: %d\n", ret); + return ret; + } + + vma->vm_flags |= VM_DONTEXPAND | VM_RESERVED; + vma->vm_private_data = &buf->handler; + vma->vm_ops = &vb2_common_vm_ops; + + vma->vm_ops->open(vma); + + printk(KERN_DEBUG "%s: mapped dma addr 0x%08lx at 0x%08lx, size %ld\n", + __func__, (unsigned long)buf->dma_addr, vma->vm_start, + buf->size); + + return 0; }

-static void vb2_dma_contig_put_userptr(void *mem_priv) +/*********************************************/ +/* callbacks for USERPTR buffers */ +/*********************************************/ + +static inline int vma_is_io(struct vm_area_struct *vma) { - struct vb2_dc_buf *buf = mem_priv; + return !!(vma->vm_flags & (VM_IO | VM_PFNMAP)); +}

+static int vb2_dc_get_pages(unsigned long start, struct page **pages, + int n_pages, struct vm_area_struct **copy_vma, int write) +{ + struct vm_area_struct *vma; + int n = 0; /* number of get pages */ + int ret = -EFAULT; + + /* entering critical section for mm access */ + down_read(&current->mm->mmap_sem); + + vma = find_vma(current->mm, start); + if (!vma) { + printk(KERN_ERR "no vma for address %lu\n", start); + goto cleanup; + } + + if (vma_is_io(vma)) { + unsigned long pfn; + + if (vma->vm_end - start < n_pages * PAGE_SIZE) { + printk(KERN_ERR "vma is too small\n"); + goto cleanup; + } + + for (n = 0; n < n_pages; ++n, start += PAGE_SIZE) { + ret = follow_pfn(vma, start, &pfn); + if (ret) { + printk(KERN_ERR "no page for address %lu\n", + start); + goto cleanup; + } + pages[n] = pfn_to_page(pfn); + get_page(pages[n]); + } + } else { + n = get_user_pages(current, current->mm, start & PAGE_MASK, + n_pages, write, 1, pages, NULL); + if (n != n_pages) { + printk(KERN_ERR "got only %d of %d user pages\n", + n, n_pages); + goto cleanup; + } + } + + *copy_vma = vb2_get_vma(vma); + if (!*copy_vma) { + printk(KERN_ERR "failed to copy vma\n"); + ret = -ENOMEM; + goto cleanup; + } + + /* leaving critical section for mm access */ + up_read(&current->mm->mmap_sem); + + return 0; + +cleanup: + up_read(&current->mm->mmap_sem); + + /* putting user pages if used, can be done wothout the lock */ + while (n) + put_page(pages[--n]); + + return ret; +} + +static void vb2_dc_put_userptr(void *buf_priv) +{ + struct vb2_dc_buf *buf = buf_priv; + struct sg_table *sgt = buf->dma_sgt; + + dma_unmap_sg(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir); + vb2_dc_put_sgtable(sgt, !vma_is_io(buf->vma)); + vb2_put_vma(buf->vma); + kfree(buf); +} + +static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr, + unsigned long size, int write) +{ + struct vb2_dc_buf *buf; + unsigned long start, end, offset, offset2; + struct page **pages; + int n_pages; + int ret = 0; + struct sg_table *sgt; + unsigned long contig_size; + + buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) - return; + return ERR_PTR(-ENOMEM); + + buf->dev = alloc_ctx; + buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + + start = (unsigned long)vaddr & PAGE_MASK; + offset = (unsigned long)vaddr & ~PAGE_MASK; + end = PAGE_ALIGN((unsigned long)vaddr + size); + offset2 = end - (unsigned long)vaddr - size; + n_pages = (end - start) >> PAGE_SHIFT; + + pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL); + if (!pages) { + ret = -ENOMEM; + printk(KERN_ERR "failed to allocate pages table\n"); + goto fail_buf; + } + + /* extract page list from userspace mapping */ + ret = vb2_dc_get_pages(start, pages, n_pages, &buf->vma, write); + if (ret) { + printk(KERN_ERR "failed to get user pages\n"); + goto fail_pages; + } + + sgt = vb2_dc_pages_to_sgt(pages, n_pages, offset, offset2); + if (!sgt) { + printk(KERN_ERR "failed to create scatterlist table\n"); + ret = -ENOMEM; + goto fail_get_pages; + } + + /* pages are no longer needed */ + kfree(pages); + pages = NULL; + + sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents, + buf->dma_dir); + if (sgt->nents <= 0) { + printk(KERN_ERR "failed to map scatterlist\n"); + ret = -EIO; + goto fail_sgt; + } + + contig_size = vb2_dc_get_contiguous_size(sgt); + if (contig_size < size) { + printk(KERN_ERR "contiguous mapping is too small %lu/%lu\n", + contig_size, size); + ret = -EFAULT; + goto fail_map_sg; + } + + buf->dma_addr = sg_dma_address(sgt->sgl); + buf->size = size; + buf->dma_sgt = sgt; + + atomic_inc(&buf->refcount); + + return buf; + +fail_map_sg: + dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);

+fail_sgt: + vb2_dc_put_sgtable(sgt, 0); + +fail_get_pages: + while (pages && n_pages) + put_page(pages[--n_pages]); vb2_put_vma(buf->vma); + +fail_pages: + kfree(pages); /* kfree is NULL-proof */ + +fail_buf: kfree(buf); + + return ERR_PTR(ret); }

+/*********************************************/ +/* DMA CONTIG exported functions */ +/*********************************************/ + const struct vb2_mem_ops vb2_dma_contig_memops = { - .alloc = vb2_dma_contig_alloc, - .put = vb2_dma_contig_put, - .cookie = vb2_dma_contig_cookie, - .vaddr = vb2_dma_contig_vaddr, - .mmap = vb2_dma_contig_mmap, - .get_userptr = vb2_dma_contig_get_userptr, - .put_userptr = vb2_dma_contig_put_userptr, - .num_users = vb2_dma_contig_num_users, + .alloc = vb2_dc_alloc, + .put = vb2_dc_put, + .cookie = vb2_dc_cookie, + .vaddr = vb2_dc_vaddr, + .mmap = vb2_dc_mmap, + .get_userptr = vb2_dc_get_userptr, + .put_userptr = vb2_dc_put_userptr, + .prepare = vb2_dc_prepare, + .finish = vb2_dc_finish, + .num_users = vb2_dc_num_users, }; EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);

void *vb2_dma_contig_init_ctx(struct device *dev) { - struct vb2_dc_conf *conf; - - conf = kzalloc(sizeof *conf, GFP_KERNEL); - if (!conf) - return ERR_PTR(-ENOMEM); - - conf->dev = dev; - - return conf; + return dev; } EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx);

void vb2_dma_contig_cleanup_ctx(void *alloc_ctx) { - kfree(alloc_ctx); } EXPORT_SYMBOL_GPL(vb2_dma_contig_cleanup_ctx);

-- 1.7.5.4

Laurent Pinchart

22 Mar 22 Mar

9:27 a.m.

New subject: [RFCv2 PATCH 2/9] v4l: vb2-dma-contig: update and code refactoring

Hi Tomasz,

Thanks for the patch.

On Tuesday 13 March 2012 11:17:00 Tomasz Stanislawski wrote:

...

This patch combines updates and fixes to dma-contig allocator. Moreover the allocator code was refactored. The most important changes are:

functions were reordered

move compression of scatterlist to separete function

add support for multichunk but contiguous scatterlists

simplified implementation of vb2-dma-contig context structure

let mmap method to use dma_mmap_writecombine

add support for scatterlist in userptr mode

Combining all this makes it pretty difficult to review the patch. What about splitting the patch into the following ?

- Function rename (s/vb2_dma_contig_/vb2_dc_/) - Code reordering - Replace custom alloc_ctx structure with direct struct device usage - The rest, possibly further split into MMAP and USERPTR changes

That would make the patch easier to review. I've already split it according to the above (with the MMAP and USERPTR changes kept together as the original patch). I'll post the patches in reply to this mail, with my comments as further replies.

-- Regards, Laurent Pinchart

Laurent Pinchart

10:02 a.m.

New subject: [RFCv2 PATCH 2/9 - 1/4] v4l: vb2-dma-contig: Shorten vb2_dma_contig prefix to vb2_dc

Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com --- drivers/media/video/videobuf2-dma-contig.c | 36 ++++++++++++++-------------- 1 files changed, 18 insertions(+), 18 deletions(-)

Needless to say, feel free to modify authorship information for these patches based on the original authors.

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index f17ad98..5207eb1 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -31,9 +31,9 @@ struct vb2_dc_buf { struct vb2_vmarea_handler handler; };

-static void vb2_dma_contig_put(void *buf_priv); +static void vb2_dc_put(void *buf_priv);

-static void *vb2_dma_contig_alloc(void *alloc_ctx, unsigned long size) +static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) { struct vb2_dc_conf *conf = alloc_ctx; struct vb2_dc_buf *buf; @@ -55,7 +55,7 @@ static void *vb2_dma_contig_alloc(void *alloc_ctx, unsigned long size) buf->size = size;

buf->handler.refcount = &buf->refcount; - buf->handler.put = vb2_dma_contig_put; + buf->handler.put = vb2_dc_put; buf->handler.arg = buf;

atomic_inc(&buf->refcount); @@ -63,7 +63,7 @@ static void *vb2_dma_contig_alloc(void *alloc_ctx, unsigned long size) return buf; }

-static void vb2_dma_contig_put(void *buf_priv) +static void vb2_dc_put(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;

@@ -74,14 +74,14 @@ static void vb2_dma_contig_put(void *buf_priv) } }

-static void *vb2_dma_contig_cookie(void *buf_priv) +static void *vb2_dc_cookie(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;

return &buf->dma_addr; }

-static void *vb2_dma_contig_vaddr(void *buf_priv) +static void *vb2_dc_vaddr(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv; if (!buf) @@ -90,14 +90,14 @@ static void *vb2_dma_contig_vaddr(void *buf_priv) return buf->vaddr; }

-static unsigned int vb2_dma_contig_num_users(void *buf_priv) +static unsigned int vb2_dc_num_users(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;

return atomic_read(&buf->refcount); }

-static int vb2_dma_contig_mmap(void *buf_priv, struct vm_area_struct *vma) +static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma) { struct vb2_dc_buf *buf = buf_priv;

@@ -110,7 +110,7 @@ static int vb2_dma_contig_mmap(void *buf_priv, struct vm_area_struct *vma) &vb2_common_vm_ops, &buf->handler); }

-static void *vb2_dma_contig_get_userptr(void *alloc_ctx, unsigned long vaddr, +static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr, unsigned long size, int write) { struct vb2_dc_buf *buf; @@ -137,7 +137,7 @@ static void *vb2_dma_contig_get_userptr(void *alloc_ctx, unsigned long vaddr, return buf; }

-static void vb2_dma_contig_put_userptr(void *mem_priv) +static void vb2_dc_put_userptr(void *mem_priv) { struct vb2_dc_buf *buf = mem_priv;

@@ -149,14 +149,14 @@ static void vb2_dma_contig_put_userptr(void *mem_priv) }

const struct vb2_mem_ops vb2_dma_contig_memops = { - .alloc = vb2_dma_contig_alloc, - .put = vb2_dma_contig_put, - .cookie = vb2_dma_contig_cookie, - .vaddr = vb2_dma_contig_vaddr, - .mmap = vb2_dma_contig_mmap, - .get_userptr = vb2_dma_contig_get_userptr, - .put_userptr = vb2_dma_contig_put_userptr, - .num_users = vb2_dma_contig_num_users, + .alloc = vb2_dc_alloc, + .put = vb2_dc_put, + .cookie = vb2_dc_cookie, + .vaddr = vb2_dc_vaddr, + .mmap = vb2_dc_mmap, + .get_userptr = vb2_dc_get_userptr, + .put_userptr = vb2_dc_put_userptr, + .num_users = vb2_dc_num_users, }; EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);

-- 1.7.3.4

Laurent Pinchart

10:02 a.m.

New subject: [RFCv2 PATCH 2/9 - 2/4] v4l: vb2-dma-contig: Reorder functions

Group functions by buffer type.

Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com --- drivers/media/video/videobuf2-dma-contig.c | 94 ++++++++++++++++----------- 1 files changed, 56 insertions(+), 38 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index 5207eb1..6bc4651 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -24,14 +24,58 @@ struct vb2_dc_conf { struct vb2_dc_buf { struct vb2_dc_conf *conf; void *vaddr; - dma_addr_t dma_addr; unsigned long size; - struct vm_area_struct *vma; - atomic_t refcount; + dma_addr_t dma_addr; + + /* MMAP related */ struct vb2_vmarea_handler handler; + atomic_t refcount; + + /* USERPTR related */ + struct vm_area_struct *vma; };

-static void vb2_dc_put(void *buf_priv); +/*********************************************/ +/* callbacks for all buffers */ +/*********************************************/ + +static void *vb2_dc_cookie(void *buf_priv) +{ + struct vb2_dc_buf *buf = buf_priv; + + return &buf->dma_addr; +} + +static void *vb2_dc_vaddr(void *buf_priv) +{ + struct vb2_dc_buf *buf = buf_priv; + if (!buf) + return 0; + + return buf->vaddr; +} + +static unsigned int vb2_dc_num_users(void *buf_priv) +{ + struct vb2_dc_buf *buf = buf_priv; + + return atomic_read(&buf->refcount); +} + +/*********************************************/ +/* callbacks for MMAP buffers */ +/*********************************************/ + +static void vb2_dc_put(void *buf_priv) +{ + struct vb2_dc_buf *buf = buf_priv; + + if (atomic_dec_and_test(&buf->refcount)) { + dma_free_coherent(buf->conf->dev, buf->size, buf->vaddr, + buf->dma_addr); + kfree(buf); + } +}

static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) { @@ -63,40 +107,6 @@ static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) return buf; }

-static void vb2_dc_put(void *buf_priv) -{ - struct vb2_dc_buf *buf = buf_priv; - - if (atomic_dec_and_test(&buf->refcount)) { - dma_free_coherent(buf->conf->dev, buf->size, buf->vaddr, - buf->dma_addr); - kfree(buf); - } -} - -static void *vb2_dc_cookie(void *buf_priv) -{ - struct vb2_dc_buf *buf = buf_priv; - - return &buf->dma_addr; -} - -static void *vb2_dc_vaddr(void *buf_priv) -{ - struct vb2_dc_buf *buf = buf_priv; - if (!buf) - return 0; - - return buf->vaddr; -} - -static unsigned int vb2_dc_num_users(void *buf_priv) -{ - struct vb2_dc_buf *buf = buf_priv; - - return atomic_read(&buf->refcount); -} - static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma) { struct vb2_dc_buf *buf = buf_priv; @@ -110,6 +120,10 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma) &vb2_common_vm_ops, &buf->handler); }

+/*********************************************/ +/* callbacks for USERPTR buffers */ +/*********************************************/ + static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr, unsigned long size, int write) { @@ -148,6 +162,10 @@ static void vb2_dc_put_userptr(void *mem_priv) kfree(buf); }

-- 1.7.3.4

Laurent Pinchart

10:02 a.m.

New subject: [RFCv2 PATCH 2/9 - 3/4] v4l: vb2-dma-contig: Remove unneeded allocation context structure

vb2-dma-contig returns a vb2_dc_conf structure instance as the vb2 allocation context. That structure only stores a pointer to the physical device. Remove it and use the device pointer directly as the allocation context.

Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com --- drivers/media/video/videobuf2-dma-contig.c | 29 ++++++--------------------- 1 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index 6bc4651..c898e6f 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -17,12 +17,8 @@ #include <media/videobuf2-core.h> #include <media/videobuf2-memops.h>

-struct vb2_dc_conf { - struct device *dev; -}; - struct vb2_dc_buf { - struct vb2_dc_conf *conf; + struct device *dev; void *vaddr; unsigned long size; dma_addr_t dma_addr; @@ -71,7 +67,7 @@ static void vb2_dc_put(void *buf_priv) struct vb2_dc_buf *buf = buf_priv;

if (atomic_dec_and_test(&buf->refcount)) { - dma_free_coherent(buf->conf->dev, buf->size, buf->vaddr, + dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr); kfree(buf); } @@ -79,23 +75,21 @@ static void vb2_dc_put(void *buf_priv)

static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) { - struct vb2_dc_conf *conf = alloc_ctx; + struct device *dev = alloc_ctx; struct vb2_dc_buf *buf;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

- buf->vaddr = dma_alloc_coherent(conf->dev, size, &buf->dma_addr, - GFP_KERNEL); + buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr, GFP_KERNEL); if (!buf->vaddr) { - dev_err(conf->dev, "dma_alloc_coherent of size %ld failed\n", - size); + dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size); kfree(buf); return ERR_PTR(-ENOMEM); }

- buf->conf = conf; + buf->dev = dev; buf->size = size;

buf->handler.refcount = &buf->refcount; @@ -180,21 +174,12 @@ EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);

void vb2_dma_contig_cleanup_ctx(void *alloc_ctx) { - kfree(alloc_ctx); } EXPORT_SYMBOL_GPL(vb2_dma_contig_cleanup_ctx);

-- 1.7.3.4

Laurent Pinchart

10:02 a.m.

New subject: [RFCv2 PATCH 2/9 - 4/4] v4l: vb2-dma-contig: update and code refactoring

From: Tomasz Stanislawski t.stanislaws@samsung.com

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com [mmap method] Signed-off-by: Andrzej Pietrasiewicz andrzej.p@samsung.com [scatterlist in userptr mode] Signed-off-by: Kamil Debski k.debski@samsung.com [bugfixing] Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com [core refactoring, helper functions] Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- drivers/media/video/videobuf2-dma-contig.c | 400 +++++++++++++++++++++++++--- 1 files changed, 365 insertions(+), 35 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index c898e6f..9965465 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -10,9 +10,12 @@ * the Free Software Foundation. */

+#include <linux/dma-buf.h> +#include <linux/dma-mapping.h> #include <linux/module.h> +#include <linux/scatterlist.h> +#include <linux/sched.h> #include <linux/slab.h> -#include <linux/dma-mapping.h>

#include <media/videobuf2-core.h> #include <media/videobuf2-memops.h> @@ -22,16 +25,115 @@ struct vb2_dc_buf { void *vaddr; unsigned long size; dma_addr_t dma_addr; + struct sg_table *dma_sgt; + enum dma_data_direction dma_dir;

/* MMAP related */ struct vb2_vmarea_handler handler; atomic_t refcount; + struct sg_table *sgt_base;

/* USERPTR related */ struct vm_area_struct *vma; };

/*********************************************/ +/* scatterlist table functions */ +/*********************************************/ + +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages, + unsigned long n_pages, size_t offset, size_t offset2) +{ + struct sg_table *sgt; + int i, j; /* loop counters */ + int cur_page, chunks; + int ret; + struct scatterlist *s; + + sgt = kzalloc(sizeof *sgt, GFP_KERNEL); + if (!sgt) + return ERR_PTR(-ENOMEM); + + /* compute number of chunks */ + chunks = 1; + for (i = 1; i < n_pages; ++i) + if (pages[i] != pages[i - 1] + 1) + ++chunks; + + ret = sg_alloc_table(sgt, chunks, GFP_KERNEL); + if (ret) { + kfree(sgt); + return ERR_PTR(-ENOMEM); + } + + /* merging chunks and putting them into the scatterlist */ + cur_page = 0; + for_each_sg(sgt->sgl, s, sgt->orig_nents, i) { + size_t size = PAGE_SIZE; + + for (j = cur_page + 1; j < n_pages; ++j) { + if (pages[j] != pages[j - 1] + 1) + break; + size += PAGE_SIZE; + } + + /* cut offset if chunk starts at the first page */ + if (cur_page == 0) + size -= offset; + /* cut offset2 if chunk ends at the last page */ + if (j == n_pages) + size -= offset2; + + sg_set_page(s, pages[cur_page], size, offset); + offset = 0; + cur_page = j; + } + + return sgt; +} + +static void vb2_dc_release_sgtable(struct sg_table *sgt) +{ + sg_free_table(sgt); + kfree(sgt); +} + +static void vb2_dc_put_sgtable(struct sg_table *sgt, int dirty) +{ + struct scatterlist *s; + int i, j; + + for_each_sg(sgt->sgl, s, sgt->nents, i) { + struct page *page = sg_page(s); + int n_pages = PAGE_ALIGN(s->offset + s->length) >> PAGE_SHIFT; + + for (j = 0; j < n_pages; ++j, ++page) { + if (dirty) + set_page_dirty_lock(page); + put_page(page); + } + } + + vb2_dc_release_sgtable(sgt); +} + +static unsigned long vb2_dc_get_contiguous_size(struct sg_table *sgt) +{ + struct scatterlist *s; + dma_addr_t expected = sg_dma_address(sgt->sgl); + int i; + unsigned long size = 0; + + for_each_sg(sgt->sgl, s, sgt->nents, i) { + if (sg_dma_address(s) != expected) + break; + expected = sg_dma_address(s) + sg_dma_len(s); + size += sg_dma_len(s); + } + return size; +} + +/*********************************************/ /* callbacks for all buffers */ /*********************************************/

@@ -45,8 +147,6 @@ static void *vb2_dc_cookie(void *buf_priv) static void *vb2_dc_vaddr(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv; - if (!buf) - return 0;

return buf->vaddr; } @@ -58,6 +158,28 @@ static unsigned int vb2_dc_num_users(void *buf_priv) return atomic_read(&buf->refcount); }

+static void vb2_dc_prepare(void *buf_priv) +{ + struct vb2_dc_buf *buf = buf_priv; + struct sg_table *sgt = buf->dma_sgt; + + if (!sgt) + return; + + dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); +} + +static void vb2_dc_finish(void *buf_priv) +{ + struct vb2_dc_buf *buf = buf_priv; + struct sg_table *sgt = buf->dma_sgt; + + if (!sgt) + return; + + dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); +} + /*********************************************/ /* callbacks for MMAP buffers */ /*********************************************/ @@ -66,31 +188,70 @@ static void vb2_dc_put(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;

- if (atomic_dec_and_test(&buf->refcount)) { - dma_free_coherent(buf->dev, buf->size, buf->vaddr, - buf->dma_addr); - kfree(buf); - } + if (!atomic_dec_and_test(&buf->refcount)) + return; + + vb2_dc_release_sgtable(buf->sgt_base); + dma_free_coherent(buf->dev, buf->size, buf->vaddr, + buf->dma_addr); + kfree(buf); }

static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) { struct device *dev = alloc_ctx; struct vb2_dc_buf *buf; + int ret; + int n_pages; + struct page **pages = NULL;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

- buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr, GFP_KERNEL); + buf->dev = dev; + buf->size = size; + buf->vaddr = dma_alloc_coherent(buf->dev, buf->size, &buf->dma_addr, + GFP_KERNEL); + + ret = -ENOMEM; if (!buf->vaddr) { - dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size); - kfree(buf); - return ERR_PTR(-ENOMEM); + dev_err(dev, "dma_alloc_coherent of size %ld failed\n", + size); + goto fail_buf; }

- buf->dev = dev; - buf->size = size; + WARN_ON((unsigned long)buf->vaddr & ~PAGE_MASK); + WARN_ON(buf->dma_addr & ~PAGE_MASK); + + n_pages = PAGE_ALIGN(size) >> PAGE_SHIFT; + + pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL); + if (!pages) { + printk(KERN_ERR "failed to alloc page table\n"); + goto fail_dma; + } + + ret = dma_get_pages(dev, buf->vaddr, buf->dma_addr, pages, n_pages); + if (ret < 0) { + printk(KERN_ERR "failed to get buffer pages from DMA API\n"); + goto fail_pages; + } + if (ret != n_pages) { + ret = -EFAULT; + printk(KERN_ERR "failed to get all pages from DMA API\n"); + goto fail_pages; + } + + buf->sgt_base = vb2_dc_pages_to_sgt(pages, n_pages, 0, 0); + if (IS_ERR(buf->sgt_base)) { + ret = PTR_ERR(buf->sgt_base); + printk(KERN_ERR "failed to prepare sg table\n"); + goto fail_pages; + } + + /* pages are no longer needed */ + kfree(pages);

buf->handler.refcount = &buf->refcount; buf->handler.put = vb2_dc_put; @@ -99,59 +260,226 @@ static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) atomic_inc(&buf->refcount);

return buf; + +fail_pages: + kfree(pages); + +fail_dma: + dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr); + +fail_buf: + kfree(buf); + + return ERR_PTR(ret); }

static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma) { struct vb2_dc_buf *buf = buf_priv; + int ret; + + /* + * dma_mmap_* uses vm_pgoff as in-buffer offset, but we want to + * map whole buffer + */ + vma->vm_pgoff = 0; + + ret = dma_mmap_writecombine(buf->dev, vma, buf->vaddr, + buf->dma_addr, buf->size);

- if (!buf) { - printk(KERN_ERR "No buffer to map\n"); - return -EINVAL; + if (ret) { + printk(KERN_ERR "Remapping memory failed, error: %d\n", ret); + return ret; }

- return vb2_mmap_pfn_range(vma, buf->dma_addr, buf->size, - &vb2_common_vm_ops, &buf->handler); + vma->vm_flags |= VM_DONTEXPAND | VM_RESERVED; + vma->vm_private_data = &buf->handler; + vma->vm_ops = &vb2_common_vm_ops; + + vma->vm_ops->open(vma); + + printk(KERN_DEBUG "%s: mapped dma addr 0x%08lx at 0x%08lx, size %ld\n", + __func__, (unsigned long)buf->dma_addr, vma->vm_start, + buf->size); + + return 0; }

/*********************************************/ /* callbacks for USERPTR buffers */ /*********************************************/

+static inline int vma_is_io(struct vm_area_struct *vma) +{ + return !!(vma->vm_flags & (VM_IO | VM_PFNMAP)); +} + +static int vb2_dc_get_pages(unsigned long start, struct page **pages, + int n_pages, struct vm_area_struct **copy_vma, int write) +{ + struct vm_area_struct *vma; + int n = 0; /* number of get pages */ + int ret = -EFAULT; + + /* entering critical section for mm access */ + down_read(&current->mm->mmap_sem); + + vma = find_vma(current->mm, start); + if (!vma) { + printk(KERN_ERR "no vma for address %lu\n", start); + goto cleanup; + } + + if (vma_is_io(vma)) { + unsigned long pfn; + + if (vma->vm_end - start < n_pages * PAGE_SIZE) { + printk(KERN_ERR "vma is too small\n"); + goto cleanup; + } + + for (n = 0; n < n_pages; ++n, start += PAGE_SIZE) { + ret = follow_pfn(vma, start, &pfn); + if (ret) { + printk(KERN_ERR "no page for address %lu\n", + start); + goto cleanup; + } + pages[n] = pfn_to_page(pfn); + get_page(pages[n]); + } + } else { + n = get_user_pages(current, current->mm, start & PAGE_MASK, + n_pages, write, 1, pages, NULL); + if (n != n_pages) { + printk(KERN_ERR "got only %d of %d user pages\n", + n, n_pages); + goto cleanup; + } + } + + *copy_vma = vb2_get_vma(vma); + if (!*copy_vma) { + printk(KERN_ERR "failed to copy vma\n"); + ret = -ENOMEM; + goto cleanup; + } + + /* leaving critical section for mm access */ + up_read(&current->mm->mmap_sem); + + return 0; + +cleanup: + up_read(&current->mm->mmap_sem); + + /* putting user pages if used, can be done wothout the lock */ + while (n) + put_page(pages[--n]); + + return ret; +} + static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr, - unsigned long size, int write) + unsigned long size, int write) { struct vb2_dc_buf *buf; - struct vm_area_struct *vma; - dma_addr_t dma_addr = 0; - int ret; + unsigned long start, end, offset, offset2; + struct page **pages; + int n_pages; + int ret = 0; + struct sg_table *sgt; + unsigned long contig_size;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

- ret = vb2_get_contig_userptr(vaddr, size, &vma, &dma_addr); + buf->dev = alloc_ctx; + buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + + start = (unsigned long)vaddr & PAGE_MASK; + offset = (unsigned long)vaddr & ~PAGE_MASK; + end = PAGE_ALIGN((unsigned long)vaddr + size); + offset2 = end - (unsigned long)vaddr - size; + n_pages = (end - start) >> PAGE_SHIFT; + + pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL); + if (!pages) { + ret = -ENOMEM; + printk(KERN_ERR "failed to allocate pages table\n"); + goto fail_buf; + } + + /* extract page list from userspace mapping */ + ret = vb2_dc_get_pages(start, pages, n_pages, &buf->vma, write); if (ret) { - printk(KERN_ERR "Failed acquiring VMA for vaddr 0x%08lx\n", - vaddr); - kfree(buf); - return ERR_PTR(ret); + printk(KERN_ERR "failed to get user pages\n"); + goto fail_pages; + } + + sgt = vb2_dc_pages_to_sgt(pages, n_pages, offset, offset2); + if (!sgt) { + printk(KERN_ERR "failed to create scatterlist table\n"); + ret = -ENOMEM; + goto fail_get_pages; }

+ /* pages are no longer needed */ + kfree(pages); + pages = NULL; + + sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents, + buf->dma_dir); + if (sgt->nents <= 0) { + printk(KERN_ERR "failed to map scatterlist\n"); + ret = -EIO; + goto fail_sgt; + } + + contig_size = vb2_dc_get_contiguous_size(sgt); + if (contig_size < size) { + printk(KERN_ERR "contiguous mapping is too small %lu/%lu\n", + contig_size, size); + ret = -EFAULT; + goto fail_map_sg; + } + + buf->dma_addr = sg_dma_address(sgt->sgl); buf->size = size; - buf->dma_addr = dma_addr; - buf->vma = vma; + buf->dma_sgt = sgt; + + atomic_inc(&buf->refcount);

return buf; + +fail_map_sg: + dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); + +fail_sgt: + vb2_dc_put_sgtable(sgt, 0); + +fail_get_pages: + while (pages && n_pages) + put_page(pages[--n_pages]); + vb2_put_vma(buf->vma); + +fail_pages: + kfree(pages); /* kfree is NULL-proof */ + +fail_buf: + kfree(buf); + + return ERR_PTR(ret); }

-static void vb2_dc_put_userptr(void *mem_priv) +static void vb2_dc_put_userptr(void *buf_priv) { - struct vb2_dc_buf *buf = mem_priv; - - if (!buf) - return; + struct vb2_dc_buf *buf = buf_priv; + struct sg_table *sgt = buf->dma_sgt;

+ dma_unmap_sg(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir); + vb2_dc_put_sgtable(sgt, !vma_is_io(buf->vma)); vb2_put_vma(buf->vma); kfree(buf); } @@ -168,6 +496,8 @@ const struct vb2_mem_ops vb2_dma_contig_memops = { .mmap = vb2_dc_mmap, .get_userptr = vb2_dc_get_userptr, .put_userptr = vb2_dc_put_userptr, + .prepare = vb2_dc_prepare, + .finish = vb2_dc_finish, .num_users = vb2_dc_num_users, }; EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);

-- 1.7.3.4

Laurent Pinchart

10:50 a.m.

New subject: [RFCv2 PATCH 2/9 - 4/4] v4l: vb2-dma-contig: update and code refactoring

Hi Tomasz,

Thanks for the patch.

On Thursday 22 March 2012 11:02:23 Laurent Pinchart wrote:

...

From: Tomasz Stanislawski t.stanislaws@samsung.com

This patch combines updates and fixes to dma-contig allocator. Moreover the allocator code was refactored. The most important changes are:

functions were reordered

move compression of scatterlist to separete function

add support for multichunk but contiguous scatterlists

simplified implementation of vb2-dma-contig context structure

let mmap method to use dma_mmap_writecombine

add support for scatterlist in userptr mode

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com [mmap method] Signed-off-by: Andrzej Pietrasiewicz andrzej.p@samsung.com [scatterlist in userptr mode] Signed-off-by: Kamil Debski k.debski@samsung.com [bugfixing] Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com [core refactoring, helper functions] Signed-off-by: Kyungmin Park kyungmin.park@samsung.com

drivers/media/video/videobuf2-dma-contig.c | 400 +++++++++++++++++++++++++--- 1 files changed, 365 insertions(+), 35 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index c898e6f..9965465 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -10,9 +10,12 @@

the Free Software Foundation.

*/

+#include <linux/dma-buf.h> +#include <linux/dma-mapping.h> #include <linux/module.h> +#include <linux/scatterlist.h> +#include <linux/sched.h> #include <linux/slab.h> -#include <linux/dma-mapping.h>

#include <media/videobuf2-core.h> #include <media/videobuf2-memops.h> @@ -22,16 +25,115 @@ struct vb2_dc_buf { void *vaddr; unsigned long size; dma_addr_t dma_addr;

struct sg_table *dma_sgt;

enum dma_data_direction dma_dir;

/* MMAP related */ struct vb2_vmarea_handler handler; atomic_t refcount;

struct sg_table *sgt_base;

/* USERPTR related */ struct vm_area_struct *vma;

};

/*********************************************/ +/* scatterlist table functions */ +/*********************************************/

+static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages,

unsigned long n_pages, size_t offset, size_t offset2)

+{
struct sg_table *sgt;

int i, j; /* loop counters */

int cur_page, chunks;

int ret;

struct scatterlist *s;

sgt = kzalloc(sizeof *sgt, GFP_KERNEL);

if (!sgt)
return ERR_PTR(-ENOMEM);
/* compute number of chunks */

chunks = 1;

for (i = 1; i < n_pages; ++i)
if (pages[i] != pages[i - 1] + 1)
	++chunks;
ret = sg_alloc_table(sgt, chunks, GFP_KERNEL);

if (ret) {
kfree(sgt);
return ERR_PTR(-ENOMEM);
}

/* merging chunks and putting them into the scatterlist */

cur_page = 0;

for_each_sg(sgt->sgl, s, sgt->orig_nents, i) {
size_t size = PAGE_SIZE;
for (j = cur_page + 1; j < n_pages; ++j) {
	if (pages[j] != pages[j - 1] + 1)
		break;
	size += PAGE_SIZE;
}
/* cut offset if chunk starts at the first page */
if (cur_page == 0)
	size -= offset;
/* cut offset2 if chunk ends at the last page */
if (j == n_pages)
	size -= offset2;
sg_set_page(s, pages[cur_page], size, offset);
offset = 0;
cur_page = j;
}

return sgt;
+}

+static void vb2_dc_release_sgtable(struct sg_table *sgt) +{

sg_free_table(sgt);

kfree(sgt);

+}

+static void vb2_dc_put_sgtable(struct sg_table *sgt, int dirty) +{
struct scatterlist *s;

int i, j;

for_each_sg(sgt->sgl, s, sgt->nents, i) {
struct page *page = sg_page(s);
int n_pages = PAGE_ALIGN(s->offset + s->length) >> PAGE_SHIFT;
for (j = 0; j < n_pages; ++j, ++page) {
	if (dirty)
		set_page_dirty_lock(page);
	put_page(page);
}
}

vb2_dc_release_sgtable(sgt);
+}

+static unsigned long vb2_dc_get_contiguous_size(struct sg_table *sgt) +{
struct scatterlist *s;

dma_addr_t expected = sg_dma_address(sgt->sgl);

int i;

unsigned long size = 0;

for_each_sg(sgt->sgl, s, sgt->nents, i) {
if (sg_dma_address(s) != expected)
	break;
expected = sg_dma_address(s) + sg_dma_len(s);
size += sg_dma_len(s);
}

return size;
+}

+/*********************************************/ /* callbacks for all buffers */ /*********************************************/

@@ -45,8 +147,6 @@ static void *vb2_dc_cookie(void *buf_priv) static void *vb2_dc_vaddr(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;
if (!buf)
return 0;
return buf->vaddr;
} @@ -58,6 +158,28 @@ static unsigned int vb2_dc_num_users(void *buf_priv) return atomic_read(&buf->refcount); }

+static void vb2_dc_prepare(void *buf_priv) +{
struct vb2_dc_buf *buf = buf_priv;

struct sg_table *sgt = buf->dma_sgt;

if (!sgt)
return;
dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
+}

+static void vb2_dc_finish(void *buf_priv) +{
struct vb2_dc_buf *buf = buf_priv;

struct sg_table *sgt = buf->dma_sgt;

if (!sgt)
return;
dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
+}

/*********************************************/ /* callbacks for MMAP buffers */ /*********************************************/ @@ -66,31 +188,70 @@ static void vb2_dc_put(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;
if (atomic_dec_and_test(&buf->refcount)) {
dma_free_coherent(buf->dev, buf->size, buf->vaddr,
		  buf->dma_addr);
kfree(buf);
}
if (!atomic_dec_and_test(&buf->refcount))
return;
vb2_dc_release_sgtable(buf->sgt_base);

dma_free_coherent(buf->dev, buf->size, buf->vaddr,
buf->dma_addr);
kfree(buf);
}

static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) { struct device *dev = alloc_ctx; struct vb2_dc_buf *buf;

int ret;

int n_pages;

struct page **pages = NULL;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr, GFP_KERNEL);
buf->dev = dev;

buf->size = size;

buf->vaddr = dma_alloc_coherent(buf->dev, buf->size, &buf->dma_addr,
GFP_KERNEL);
ret = -ENOMEM; if (!buf->vaddr) {
dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
kfree(buf);
return ERR_PTR(-ENOMEM);
dev_err(dev, "dma_alloc_coherent of size %ld failed\n",
	size);
goto fail_buf;
}
buf->dev = dev;

buf->size = size;
WARN_ON((unsigned long)buf->vaddr & ~PAGE_MASK);

WARN_ON(buf->dma_addr & ~PAGE_MASK);

n_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
printk(KERN_ERR "failed to alloc page table\n");
goto fail_dma;
}

ret = dma_get_pages(dev, buf->vaddr, buf->dma_addr, pages, n_pages);

As the only purpose of this is to retrieve a list of pages that will be used to create a single-entry sgt, wouldn't it be possible to shortcut the code and get the physical address of the buffer directly ?

...

if (ret < 0) {
printk(KERN_ERR "failed to get buffer pages from DMA API\n");
goto fail_pages;
}

if (ret != n_pages) {
ret = -EFAULT;
printk(KERN_ERR "failed to get all pages from DMA API\n");
goto fail_pages;
}

buf->sgt_base = vb2_dc_pages_to_sgt(pages, n_pages, 0, 0);

if (IS_ERR(buf->sgt_base)) {
ret = PTR_ERR(buf->sgt_base);
printk(KERN_ERR "failed to prepare sg table\n");
goto fail_pages;
}

buf->sgt_base isn't used in this patch. I would move the buf->sgt_base creation code to the patch that uses it then, or to its own patch just before the patch that uses it.

...

/* pages are no longer needed */

kfree(pages);

buf->handler.refcount = &buf->refcount; buf->handler.put = vb2_dc_put;

@@ -99,59 +260,226 @@ static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) atomic_inc(&buf->refcount);

return buf;

+fail_pages:

kfree(pages);

+fail_dma:

dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);

+fail_buf:

kfree(buf);

return ERR_PTR(ret);

}

static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma) { struct vb2_dc_buf *buf = buf_priv;
int ret;

/*
* dma_mmap_* uses vm_pgoff as in-buffer offset, but we want to
* map whole buffer
*/
vma->vm_pgoff = 0;

ret = dma_mmap_writecombine(buf->dev, vma, buf->vaddr,
buf->dma_addr, buf->size);
if (!buf) {
printk(KERN_ERR "No buffer to map\n");
return -EINVAL;
if (ret) {
printk(KERN_ERR "Remapping memory failed, error: %d\n", ret);
return ret;
}
return vb2_mmap_pfn_range(vma, buf->dma_addr, buf->size,
		  &vb2_common_vm_ops, &buf->handler);
vma->vm_flags |= VM_DONTEXPAND | VM_RESERVED;

vma->vm_private_data = &buf->handler;

vma->vm_ops = &vb2_common_vm_ops;

vma->vm_ops->open(vma);

printk(KERN_DEBUG "%s: mapped dma addr 0x%08lx at 0x%08lx, size %ld\n",
__func__, (unsigned long)buf->dma_addr, vma->vm_start,
buf->size);
return 0;
}

/*********************************************/ /* callbacks for USERPTR buffers */ /*********************************************/

+static inline int vma_is_io(struct vm_area_struct *vma) +{

return !!(vma->vm_flags & (VM_IO | VM_PFNMAP));

Isn't VM_PFNMAP enough ? Wouldn't it be possible (at least in theory) to get a discontinuous physical range with VM_IO ?

...

+}

+static int vb2_dc_get_pages(unsigned long start, struct page **pages,

int n_pages, struct vm_area_struct **copy_vma, int write)

+{

struct vm_area_struct *vma;

int n = 0; /* number of get pages */

int ret = -EFAULT;

/* entering critical section for mm access */

down_read(&current->mm->mmap_sem);

This will generate AB-BA deadlock warnings if lockdep is enabled. This function is called with the queue lock held, and the mmap() handler which takes the queue lock is called with current->mm->mmap_sem held.

This is a known issue with videobuf2, not specific to this patch. The warning is usually a false positive (which we still need to fix, as it worries users), but can become a real issue if an MMAP queue and a USERPTR queue are created by a driver with the same queue lock.

...

vma = find_vma(current->mm, start);
if (!vma) {

printk(KERN_ERR "no vma for address %lu\n", start);

```
goto cleanup;
```
}
if (vma_is_io(vma)) {
```
unsigned long pfn;
```

if (vma->vm_end - start < n_pages * PAGE_SIZE) {

	printk(KERN_ERR "vma is too small\n");

```
	goto cleanup;
```
```
}
```

for (n = 0; n < n_pages; ++n, start += PAGE_SIZE) {

```
	ret = follow_pfn(vma, start, &pfn);
```
```
	if (ret) {
```

		printk(KERN_ERR "no page for address %lu\n",

```
			start);
```
```
		goto cleanup;
```
```
	}
```
```
	pages[n] = pfn_to_page(pfn);
```
```
	get_page(pages[n]);
```

This worries me. When the VM_PFNMAP flag is set, the memory pages are not backed by a struct page. Creating a struct page pointer out of it can be an acceptable hack (for instance to store a page in an scatterlist with sg_set_page() and then retrieve its physical address with sg_phys()), but you should not expect the struct page to be valid for anything else. Calling get_page() on it will likely crash.

...

```
}
```
} else {

n = get_user_pages(current, current->mm, start & PAGE_MASK,

```
	n_pages, write, 1, pages, NULL);
```
```
if (n != n_pages) {
```

	printk(KERN_ERR "got only %d of %d user pages\n",

```
		n, n_pages);
```
```
	goto cleanup;
```
```
}
```
}
*copy_vma = vb2_get_vma(vma);
if (!*copy_vma) {

printk(KERN_ERR "failed to copy vma\n");

```
ret = -ENOMEM;
```
```
goto cleanup;
```
}

Do we really need to make a copy of the VMA ? The only reason why we store a pointer to it is to check the flags in vb2_dc_put_userptr(). We could store the flags instead and avoid vb2_get_dma()/vb2_put_dma() calls altogether.

...

/* leaving critical section for mm access */

up_read(&current->mm->mmap_sem);

return 0;

+cleanup:
up_read(&current->mm->mmap_sem);

/* putting user pages if used, can be done wothout the lock */

while (n)
put_page(pages[--n]);
return ret;
+}

static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
			unsigned long size, int write)
unsigned long size, int write)

{ struct vb2_dc_buf *buf;

struct vm_area_struct *vma;

dma_addr_t dma_addr = 0;

int ret;

unsigned long start, end, offset, offset2;

struct page **pages;

int n_pages;

int ret = 0;

struct sg_table *sgt;

unsigned long contig_size;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

ret = vb2_get_contig_userptr(vaddr, size, &vma, &dma_addr);
buf->dev = alloc_ctx;

buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;

start = (unsigned long)vaddr & PAGE_MASK;

offset = (unsigned long)vaddr & ~PAGE_MASK;

end = PAGE_ALIGN((unsigned long)vaddr + size);

offset2 = end - (unsigned long)vaddr - size;

n_pages = (end - start) >> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
ret = -ENOMEM;
printk(KERN_ERR "failed to allocate pages table\n");
goto fail_buf;
}

/* extract page list from userspace mapping */

ret = vb2_dc_get_pages(start, pages, n_pages, &buf->vma, write); if (ret) {
printk(KERN_ERR "Failed acquiring VMA for vaddr 0x%08lx\n",
		vaddr);
kfree(buf);
return ERR_PTR(ret);
printk(KERN_ERR "failed to get user pages\n");
goto fail_pages;
}

sgt = vb2_dc_pages_to_sgt(pages, n_pages, offset, offset2);

if (!sgt) {
printk(KERN_ERR "failed to create scatterlist table\n");
ret = -ENOMEM;
goto fail_get_pages;
}

This looks overly complex to me. You create a multi-chunk sgt out of the user pointer address and map it completely, and then check if it starts with a big enough contiguous chunk. Why don't you create an sgt with a single continuous chunk then ? In the VM_PFNMAP case you could check whether the area is contiguous when you follow the PFNs, stop at the first discontinuity, and create an sgt with a single element right there. You would then need to call vb2_dc_pages_to_sgt() in the normal case only, and stop at the first discontinuity as well.

...

/* pages are no longer needed */

kfree(pages);

pages = NULL;

sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents,
buf->dma_dir);
if (sgt->nents <= 0) {
printk(KERN_ERR "failed to map scatterlist\n");
ret = -EIO;
goto fail_sgt;
}

contig_size = vb2_dc_get_contiguous_size(sgt);

if (contig_size < size) {
printk(KERN_ERR "contiguous mapping is too small %lu/%lu\n",
	contig_size, size);
ret = -EFAULT;
goto fail_map_sg;
}

buf->dma_addr = sg_dma_address(sgt->sgl); buf->size = size;
buf->dma_addr = dma_addr;

buf->vma = vma;

buf->dma_sgt = sgt;

atomic_inc(&buf->refcount);

return buf;

+fail_map_sg:

dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);

I think this will break in the VM_PFNMAP case on non-coherent architectures. arm_dma_unmap_page() will call __dma_page_dev_to_cpu() in that case, which can dereference struct page. As explain above, the struct page isn't valid with VM_PFNMAP. I haven't check the dma_map_sg() and dma_sync_sg_*() calls, but changes are they might break as well.

...

+fail_sgt:

vb2_dc_put_sgtable(sgt, 0);

+fail_get_pages:
while (pages && n_pages)
put_page(pages[--n_pages]);
vb2_put_vma(buf->vma);
+fail_pages:

kfree(pages); /* kfree is NULL-proof */

+fail_buf:

kfree(buf);

return ERR_PTR(ret);

}

-static void vb2_dc_put_userptr(void *mem_priv) +static void vb2_dc_put_userptr(void *buf_priv) {
struct vb2_dc_buf *buf = mem_priv;

if (!buf)
return;
struct vb2_dc_buf *buf = buf_priv;

struct sg_table *sgt = buf->dma_sgt;

dma_unmap_sg(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir);

vb2_dc_put_sgtable(sgt, !vma_is_io(buf->vma)); vb2_put_vma(buf->vma); kfree(buf);

} @@ -168,6 +496,8 @@ const struct vb2_mem_ops vb2_dma_contig_memops = { .mmap = vb2_dc_mmap, .get_userptr = vb2_dc_get_userptr, .put_userptr = vb2_dc_put_userptr,

.prepare = vb2_dc_prepare,

.finish = vb2_dc_finish, .num_users = vb2_dc_num_users,

}; EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);

-- Regards, Laurent Pinchart

Tomasz Stanislawski

1:36 p.m.

New subject: [RFCv2 PATCH 2/9 - 4/4] v4l: vb2-dma-contig: update and code refactoring

Hi Laurent, Thank you very much for your comments and question. They were very useful. Please refer to the comments below.

On 03/22/2012 11:50 AM, Laurent Pinchart wrote:

...

Hi Tomasz,

Thanks for the patch.

On Thursday 22 March 2012 11:02:23 Laurent Pinchart wrote:

...
From: Tomasz Stanislawski t.stanislaws@samsung.com

This patch combines updates and fixes to dma-contig allocator. Moreover the allocator code was refactored. The most important changes are:

functions were reordered

move compression of scatterlist to separete function

add support for multichunk but contiguous scatterlists

simplified implementation of vb2-dma-contig context structure

let mmap method to use dma_mmap_writecombine

add support for scatterlist in userptr mode

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com [mmap method] Signed-off-by: Andrzej Pietrasiewicz andrzej.p@samsung.com [scatterlist in userptr mode] Signed-off-by: Kamil Debski k.debski@samsung.com [bugfixing] Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com [core refactoring, helper functions] Signed-off-by: Kyungmin Park kyungmin.park@samsung.com

drivers/media/video/videobuf2-dma-contig.c | 400 +++++++++++++++++++++++++--- 1 files changed, 365 insertions(+), 35 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index c898e6f..9965465 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -10,9 +10,12 @@

the Free Software Foundation.

*/

+#include <linux/dma-buf.h> +#include <linux/dma-mapping.h> #include <linux/module.h> +#include <linux/scatterlist.h> +#include <linux/sched.h> #include <linux/slab.h> -#include <linux/dma-mapping.h>

#include <media/videobuf2-core.h> #include <media/videobuf2-memops.h> @@ -22,16 +25,115 @@ struct vb2_dc_buf { void *vaddr; unsigned long size; dma_addr_t dma_addr;

struct sg_table *dma_sgt;

enum dma_data_direction dma_dir;

/* MMAP related */ struct vb2_vmarea_handler handler; atomic_t refcount;

struct sg_table *sgt_base;

/* USERPTR related */ struct vm_area_struct *vma;

};

/*********************************************/ +/* scatterlist table functions */ +/*********************************************/

+static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages,

unsigned long n_pages, size_t offset, size_t offset2)

+{
struct sg_table *sgt;

int i, j; /* loop counters */

int cur_page, chunks;

int ret;

struct scatterlist *s;

sgt = kzalloc(sizeof *sgt, GFP_KERNEL);

if (!sgt)
return ERR_PTR(-ENOMEM);
/* compute number of chunks */

chunks = 1;

for (i = 1; i < n_pages; ++i)
if (pages[i] != pages[i - 1] + 1)
	++chunks;
ret = sg_alloc_table(sgt, chunks, GFP_KERNEL);

if (ret) {
kfree(sgt);
return ERR_PTR(-ENOMEM);
}

/* merging chunks and putting them into the scatterlist */

cur_page = 0;

for_each_sg(sgt->sgl, s, sgt->orig_nents, i) {
size_t size = PAGE_SIZE;
for (j = cur_page + 1; j < n_pages; ++j) {
	if (pages[j] != pages[j - 1] + 1)
		break;
	size += PAGE_SIZE;
}
/* cut offset if chunk starts at the first page */
if (cur_page == 0)
	size -= offset;
/* cut offset2 if chunk ends at the last page */
if (j == n_pages)
	size -= offset2;
sg_set_page(s, pages[cur_page], size, offset);
offset = 0;
cur_page = j;
}

return sgt;
+}

+static void vb2_dc_release_sgtable(struct sg_table *sgt) +{

sg_free_table(sgt);

kfree(sgt);

+}

+static void vb2_dc_put_sgtable(struct sg_table *sgt, int dirty) +{
struct scatterlist *s;

int i, j;

for_each_sg(sgt->sgl, s, sgt->nents, i) {
struct page *page = sg_page(s);
int n_pages = PAGE_ALIGN(s->offset + s->length) >> PAGE_SHIFT;
for (j = 0; j < n_pages; ++j, ++page) {
	if (dirty)
		set_page_dirty_lock(page);
	put_page(page);
}
}

vb2_dc_release_sgtable(sgt);
+}

+static unsigned long vb2_dc_get_contiguous_size(struct sg_table *sgt) +{
struct scatterlist *s;

dma_addr_t expected = sg_dma_address(sgt->sgl);

int i;

unsigned long size = 0;

for_each_sg(sgt->sgl, s, sgt->nents, i) {
if (sg_dma_address(s) != expected)
	break;
expected = sg_dma_address(s) + sg_dma_len(s);
size += sg_dma_len(s);
}

return size;
+}

+/*********************************************/ /* callbacks for all buffers */ /*********************************************/

@@ -45,8 +147,6 @@ static void *vb2_dc_cookie(void *buf_priv) static void *vb2_dc_vaddr(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;
if (!buf)
return 0;
return buf->vaddr;
} @@ -58,6 +158,28 @@ static unsigned int vb2_dc_num_users(void *buf_priv) return atomic_read(&buf->refcount); }

+static void vb2_dc_prepare(void *buf_priv) +{
struct vb2_dc_buf *buf = buf_priv;

struct sg_table *sgt = buf->dma_sgt;

if (!sgt)
return;
dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
+}

+static void vb2_dc_finish(void *buf_priv) +{
struct vb2_dc_buf *buf = buf_priv;

struct sg_table *sgt = buf->dma_sgt;

if (!sgt)
return;
dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
+}

/*********************************************/ /* callbacks for MMAP buffers */ /*********************************************/ @@ -66,31 +188,70 @@ static void vb2_dc_put(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;
if (atomic_dec_and_test(&buf->refcount)) {
dma_free_coherent(buf->dev, buf->size, buf->vaddr,
		  buf->dma_addr);
kfree(buf);
}
if (!atomic_dec_and_test(&buf->refcount))
return;
vb2_dc_release_sgtable(buf->sgt_base);

dma_free_coherent(buf->dev, buf->size, buf->vaddr,
buf->dma_addr);
kfree(buf);
}

static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) { struct device *dev = alloc_ctx; struct vb2_dc_buf *buf;

int ret;

int n_pages;http://165.213.219.115/cgi-bin/gitweb.cgi?p=mirror/linux-3.0-midas%3Ba=short...

struct page **pages = NULL;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr, GFP_KERNEL);
buf->dev = dev;

buf->size = size;

buf->vaddr = dma_alloc_coherent(buf->dev, buf->size, &buf->dma_addr,
GFP_KERNEL);
ret = -ENOMEM; if (!buf->vaddr) {
dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
kfree(buf);
return ERR_PTR(-ENOMEM);
dev_err(dev, "dma_alloc_coherent of size %ld failed\n",
	size);
goto fail_buf;
}
buf->dev = dev;

buf->size = size;
WARN_ON((unsigned long)buf->vaddr & ~PAGE_MASK);

WARN_ON(buf->dma_addr & ~PAGE_MASK);

n_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
printk(KERN_ERR "failed to alloc page table\n");
goto fail_dma;
}

ret = dma_get_pages(dev, buf->vaddr, buf->dma_addr, pages, n_pages);
As the only purpose of this is to retrieve a list of pages that will be used to create a single-entry sgt, wouldn't it be possible to shortcut the code and get the physical address of the buffer directly ?

The physical address should not be used since they are meaningless in a context of different devices. It seams that only the list of pages is more-or-less portable between different drivers.

The physical address is already present in buf->dma_addr, but it is only valid if the device has no MMU. Notice that vb2-dma-contig possess no knowledge if MMU is present for a given device.

The sg list is not going to be single-entry if the device is provided with its own MMU.

...

...
if (ret < 0) {
printk(KERN_ERR "failed to get buffer pages from DMA API\n");
goto fail_pages;
}

if (ret != n_pages) {
ret = -EFAULT;
printk(KERN_ERR "failed to get all pages from DMA API\n");
goto fail_pages;
}

buf->sgt_base = vb2_dc_pages_to_sgt(pages, n_pages, 0, 0);

if (IS_ERR(buf->sgt_base)) {
ret = PTR_ERR(buf->sgt_base);
printk(KERN_ERR "failed to prepare sg table\n");
goto fail_pages;
}
buf->sgt_base isn't used in this patch. I would move the buf->sgt_base creation code to the patch that uses it then, or to its own patch just before the patch that uses it.

Good point. The sgt_base is used by exporter only. Thanks for noticing it.

...

...
/* pages are no longer needed */

kfree(pages);

buf->handler.refcount = &buf->refcount; buf->handler.put = vb2_dc_put;

@@ -99,59 +260,226 @@ static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) atomic_inc(&buf->refcount);

return buf;

+fail_pages:

kfree(pages);

+fail_dma:

dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);

+fail_buf:

kfree(buf);

return ERR_PTR(ret);

}

static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma) { struct vb2_dc_buf *buf = buf_priv;
int ret;

/*
* dma_mmap_* uses vm_pgoff as in-buffer offset, but we want to
* map whole buffer
*/
vma->vm_pgoff = 0;

ret = dma_mmap_writecombine(buf->dev, vma, buf->vaddr,
buf->dma_addr, buf->size);
if (!buf) {
printk(KERN_ERR "No buffer to map\n");
return -EINVAL;
if (ret) {
printk(KERN_ERR "Remapping memory failed, error: %d\n", ret);
return ret;
}
return vb2_mmap_pfn_range(vma, buf->dma_addr, buf->size,
		  &vb2_common_vm_ops, &buf->handler);
vma->vm_flags |= VM_DONTEXPAND | VM_RESERVED;

vma->vm_private_data = &buf->handler;

vma->vm_ops = &vb2_common_vm_ops;

vma->vm_ops->open(vma);

printk(KERN_DEBUG "%s: mapped dma addr 0x%08lx at 0x%08lx, size %ld\n",
__func__, (unsigned long)buf->dma_addr, vma->vm_start,
buf->size);
return 0;
}

/*********************************************/ /* callbacks for USERPTR buffers */ /*********************************************/

+static inline int vma_is_io(struct vm_area_struct *vma) +{

return !!(vma->vm_flags & (VM_IO | VM_PFNMAP));
Isn't VM_PFNMAP enough ? Wouldn't it be possible (at least in theory) to get a discontinuous physical range with VM_IO ?

Frankly, I found that that in get_user_pages flags are checked against (VM_IO | VM_PFNMAP). Probably for noMMU (not no IOMMU) case it is possible to get vma with VM_IO on and VM_PFNMAP off, isn't it?

The problem is that this framework should work in both cases so this check was added just in case :).

...

...
+}

+static int vb2_dc_get_pages(unsigned long start, struct page **pages,

int n_pages, struct vm_area_struct **copy_vma, int write)

+{

struct vm_area_struct *vma;

int n = 0; /* number of get pages */

int ret = -EFAULT;

/* entering critical section for mm access */

down_read(&current->mm->mmap_sem);

This will generate AB-BA deadlock warnings if lockdep is enabled. This function is called with the queue lock held, and the mmap() handler which takes the queue lock is called with current->mm->mmap_sem held.

This is a known issue with videobuf2, not specific to this patch. The warning is usually a false positive (which we still need to fix, as it worries users), but can become a real issue if an MMAP queue and a USERPTR queue are created by a driver with the same queue lock.

Good point. Do you know any good solution to this problem?

...

...
vma = find_vma(current->mm, start);

if (!vma) {
printk(KERN_ERR "no vma for address %lu\n", start);
goto cleanup;
}

if (vma_is_io(vma)) {
unsigned long pfn;
if (vma->vm_end - start < n_pages * PAGE_SIZE) {
	printk(KERN_ERR "vma is too small\n");
	goto cleanup;
}
for (n = 0; n < n_pages; ++n, start += PAGE_SIZE) {
	ret = follow_pfn(vma, start, &pfn);
	if (ret) {
		printk(KERN_ERR "no page for address %lu\n",
			start);
		goto cleanup;
	}
	pages[n] = pfn_to_page(pfn);
	get_page(pages[n]);
This worries me. When the VM_PFNMAP flag is set, the memory pages are not backed by a struct page. Creating a struct page pointer out of it can be an acceptable hack (for instance to store a page in an scatterlist with sg_set_page() and then retrieve its physical address with sg_phys()), but you should not expect the struct page to be valid for anything else. Calling get_page() on it will likely crash.

You are completetly right. This is the corner case where list of pages is not a portable way of describing the memory. Maybe pfn_valid should be used to check validity of the page (pfn) before getting it?

...

...
}
} else {
n = get_user_pages(current, current->mm, start & PAGE_MASK,
	n_pages, write, 1, pages, NULL);
if (n != n_pages) {
	printk(KERN_ERR "got only %d of %d user pages\n",
		n, n_pages);
	goto cleanup;
}
}

*copy_vma = vb2_get_vma(vma);

if (!*copy_vma) {
printk(KERN_ERR "failed to copy vma\n");
ret = -ENOMEM;
goto cleanup;
}
Do we really need to make a copy of the VMA ? The only reason why we store a pointer to it is to check the flags in vb2_dc_put_userptr(). We could store the flags instead and avoid vb2_get_dma()/vb2_put_dma() calls altogether.

I remember that there was a very good reason of copying this vma structure. You caught me on 'cargo-cult' programming. I will do some reverse engineering and try to answer it soon.

...

...
/* leaving critical section for mm access */

up_read(&current->mm->mmap_sem);

return 0;

+cleanup:
up_read(&current->mm->mmap_sem);

/* putting user pages if used, can be done wothout the lock */

while (n)
put_page(pages[--n]);
return ret;
+}

static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
			unsigned long size, int write)
unsigned long size, int write)

{ struct vb2_dc_buf *buf;

struct vm_area_struct *vma;

dma_addr_t dma_addr = 0;

int ret;

unsigned long start, end, offset, offset2;

struct page **pages;

int n_pages;

int ret = 0;

struct sg_table *sgt;

unsigned long contig_size;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

ret = vb2_get_contig_userptr(vaddr, size, &vma, &dma_addr);
buf->dev = alloc_ctx;

buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;

start = (unsigned long)vaddr & PAGE_MASK;

offset = (unsigned long)vaddr & ~PAGE_MASK;

end = PAGE_ALIGN((unsigned long)vaddr + size);

offset2 = end - (unsigned long)vaddr - size;

n_pages = (end - start) >> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
ret = -ENOMEM;
printk(KERN_ERR "failed to allocate pages table\n");
goto fail_buf;
}

/* extract page list from userspace mapping */

ret = vb2_dc_get_pages(start, pages, n_pages, &buf->vma, write); if (ret) {
printk(KERN_ERR "Failed acquiring VMA for vaddr 0x%08lx\n",
		vaddr);
kfree(buf);
return ERR_PTR(ret);
printk(KERN_ERR "failed to get user pages\n");
goto fail_pages;
}

sgt = vb2_dc_pages_to_sgt(pages, n_pages, offset, offset2);

if (!sgt) {
printk(KERN_ERR "failed to create scatterlist table\n");
ret = -ENOMEM;
goto fail_get_pages;
}
This looks overly complex to me. You create a multi-chunk sgt out of the user pointer address and map it completely, and then check if it starts with a big enough contiguous chunk.

Notice that vb2_dc_pages_to_sgt does compress contiguous ranges of pfns (pages). So if the memory is contiguous, then sigle chunk sglist is produced. The memory used to store pages list is just temporary. It is freed after the sglist is created.

...

Why don't you create an sgt with a single continuous chunk then ? In the VM_PFNMAP case you could check whether the area is contiguous when you follow the PFNs, stop at the first discontinuity, and create an sgt with a single element right there. You would then need to call vb2_dc_pages_to_sgt() in the normal case only, and stop at the first discontinuity as well.

Discontinuity of pfns is not a problem if a device has own IOMMU. It is not known for vb2-dma-contig if mapping this multi-chunk sglist will succeed until calling and checking a result of dma_map_sg.

Why bothering if both VM_PFNMAP and non-VM_PFNMAP are handled in the same way after list of pages is obtained? Trating them the same way allows to reuse code and simplify the program flow.

The DMA framework does not provide any way to force single chunk mapping in sg. If the device is capable of mapping discontinous pages into a single chunk the DMA framework will probably do merge the pages. The documentation encourages to merge the list but it is not obligatory.

The reason is that if 'struct scatterlist' contains no dma_length field, what is controlled by CONFIG_NEED_SG_DMA_LENGTH macro, then field length is used instead. Chunk marging cannot be done in such a case. This is the reason why I look for the longest contiguous block.

...

...
/* pages are no longer needed */

kfree(pages);

pages = NULL;

sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents,
buf->dma_dir);
if (sgt->nents <= 0) {
printk(KERN_ERR "failed to map scatterlist\n");
ret = -EIO;
goto fail_sgt;
}

contig_size = vb2_dc_get_contiguous_size(sgt);

if (contig_size < size) {
printk(KERN_ERR "contiguous mapping is too small %lu/%lu\n",
	contig_size, size);
ret = -EFAULT;
goto fail_map_sg;
}

buf->dma_addr = sg_dma_address(sgt->sgl); buf->size = size;
buf->dma_addr = dma_addr;

buf->vma = vma;

buf->dma_sgt = sgt;

atomic_inc(&buf->refcount);

return buf;

+fail_map_sg:

dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
I think this will break in the VM_PFNMAP case on non-coherent architectures. arm_dma_unmap_page() will call __dma_page_dev_to_cpu() in that case, which can dereference struct page. As explain above, the struct page isn't valid with VM_PFNMAP. I haven't check the dma_map_sg() and dma_sync_sg_*() calls, but changes are they might break as well.

It will crash as long it is true that there is no struct page behind given pfn. In practice, I found that VM_PFNMAP means that one cannot assume that there is a 'struct page' behind PFNs of a given mapping. Thoses struct pages really exists for all our drivers. Anyway, I agree that using those pages is a hack.

It could be avoided if vb2_dc_get_pages returned a list of PFNs. Anyway, those PFNs have to be transformed to pages to create an sglist. Those pointers might be accessed somewhere deep inside dma_map_sg internals.

The quite good solution would be dropping support for VM_PFNMAP mappings since they cannot be handled reliably.

...

...
+fail_sgt:

vb2_dc_put_sgtable(sgt, 0);

+fail_get_pages:
while (pages && n_pages)
put_page(pages[--n_pages]);
vb2_put_vma(buf->vma);
+fail_pages:

kfree(pages); /* kfree is NULL-proof */

+fail_buf:

kfree(buf);

return ERR_PTR(ret);

}

-static void vb2_dc_put_userptr(void *mem_priv) +static void vb2_dc_put_userptr(void *buf_priv) {
struct vb2_dc_buf *buf = mem_priv;

if (!buf)
return;
struct vb2_dc_buf *buf = buf_priv;

struct sg_table *sgt = buf->dma_sgt;

dma_unmap_sg(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir);

vb2_dc_put_sgtable(sgt, !vma_is_io(buf->vma)); vb2_put_vma(buf->vma); kfree(buf);

} @@ -168,6 +496,8 @@ const struct vb2_mem_ops vb2_dma_contig_memops = { .mmap = vb2_dc_mmap, .get_userptr = vb2_dc_get_userptr, .put_userptr = vb2_dc_put_userptr,

.prepare = vb2_dc_prepare,

.finish = vb2_dc_finish, .num_users = vb2_dc_num_users,

}; EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);

Regards, Tomasz Stanislawski

Laurent Pinchart

2:42 p.m.

New subject: [RFCv2 PATCH 2/9 - 4/4] v4l: vb2-dma-contig: update and code refactoring

Hi Tomasz,

On Thursday 22 March 2012 14:36:33 Tomasz Stanislawski wrote:

...

Hi Laurent, Thank you very much for your comments and question. They were very useful.

You're welcome.

...

Please refer to the comments below.

On 03/22/2012 11:50 AM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 11:02:23 Laurent Pinchart wrote:

...
From: Tomasz Stanislawski t.stanislaws@samsung.com

This patch combines updates and fixes to dma-contig allocator. Moreover the allocator code was refactored. The most important changes are:

functions were reordered

move compression of scatterlist to separete function

add support for multichunk but contiguous scatterlists

simplified implementation of vb2-dma-contig context structure

let mmap method to use dma_mmap_writecombine

add support for scatterlist in userptr mode

[snip]

...

...
...
diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index c898e6f..9965465 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c

[snip]

...

...
...
static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) { struct device *dev = alloc_ctx; struct vb2_dc_buf *buf;

int ret;

int

n_pages;http://165.213.219.115/cgi-bin/gitweb.cgi?p=mirror/linux-3.0-mid as;a=shortlog;h=refs/heads/exynos-3.0-dev + struct page **pages = NULL;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr,

GFP_KERNEL);

...

...
...
buf->dev = dev;

buf->size = size;

buf->vaddr = dma_alloc_coherent(buf->dev, buf->size, &buf->dma_addr,
GFP_KERNEL);
ret = -ENOMEM;

if (!buf->vaddr) {
dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
kfree(buf);
return ERR_PTR(-ENOMEM);
dev_err(dev, "dma_alloc_coherent of size %ld failed\n",
	size);
goto fail_buf;
}
buf->dev = dev;

buf->size = size;
WARN_ON((unsigned long)buf->vaddr & ~PAGE_MASK);

WARN_ON(buf->dma_addr & ~PAGE_MASK);

n_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
printk(KERN_ERR "failed to alloc page table\n");
goto fail_dma;
}

ret = dma_get_pages(dev, buf->vaddr, buf->dma_addr, pages, n_pages);
As the only purpose of this is to retrieve a list of pages that will be used to create a single-entry sgt, wouldn't it be possible to shortcut the code and get the physical address of the buffer directly ?
The physical address should not be used since they are meaningless in a context of different devices. It seams that only the list of pages is more-or-less portable between different drivers.

The pages are physically contiguous. The physical address of the first page is thus all you need.

struct page and physical addresses can be used interchangeably in this case if I'm not mistaken. If you want to go with pages, you could use the first page only instead of the physical buffer address.

...

The physical address is already present in buf->dma_addr, but it is only valid if the device has no MMU. Notice that vb2-dma-contig possess no knowledge if MMU is present for a given device.

That's why buf->dma_addr can't be considered as a physical address. It's only useful in the device context.

...

The sg list is not going to be single-entry if the device is provided with its own MMU.

There's something I don't get then. vb2-dma-contig deals with physically contiguous buffers. The buffer is backed by physically contiguous pages, so the sg list should have a single entry.

...

...
...
if (ret < 0) {
printk(KERN_ERR "failed to get buffer pages from DMA API\n");
goto fail_pages;
}

if (ret != n_pages) {
ret = -EFAULT;
printk(KERN_ERR "failed to get all pages from DMA API\n");
goto fail_pages;
}

buf->sgt_base = vb2_dc_pages_to_sgt(pages, n_pages, 0, 0);

if (IS_ERR(buf->sgt_base)) {
ret = PTR_ERR(buf->sgt_base);
printk(KERN_ERR "failed to prepare sg table\n");
goto fail_pages;
}
buf->sgt_base isn't used in this patch. I would move the buf->sgt_base creation code to the patch that uses it then, or to its own patch just before the patch that uses it.
Good point. The sgt_base is used by exporter only. Thanks for noticing it.

...
...

/* pages are no longer needed */

kfree(pages);

buf->handler.refcount = &buf->refcount; buf->handler.put = vb2_dc_put;

[snip]

...

...
...
/*********************************************/ /* callbacks for USERPTR buffers */ /*********************************************/

+static inline int vma_is_io(struct vm_area_struct *vma) +{

return !!(vma->vm_flags & (VM_IO | VM_PFNMAP));

Isn't VM_PFNMAP enough ? Wouldn't it be possible (at least in theory) to get a discontinuous physical range with VM_IO ?

Frankly, I found that that in get_user_pages flags are checked against (VM_IO | VM_PFNMAP). Probably for noMMU (not no IOMMU) case it is possible to get vma with VM_IO on and VM_PFNMAP off, isn't it?

The problem is that this framework should work in both cases so this check was added just in case :).

OK. We can leave it here and deal with problems if they arise :-)

...

...
...
+}

+static int vb2_dc_get_pages(unsigned long start, struct page **pages,

int n_pages, struct vm_area_struct **copy_vma, int write)

+{

struct vm_area_struct *vma;

int n = 0; /* number of get pages */

int ret = -EFAULT;

/* entering critical section for mm access */

down_read(&current->mm->mmap_sem);

This will generate AB-BA deadlock warnings if lockdep is enabled. This function is called with the queue lock held, and the mmap() handler which takes the queue lock is called with current->mm->mmap_sem held.

This is a known issue with videobuf2, not specific to this patch. The warning is usually a false positive (which we still need to fix, as it worries users), but can become a real issue if an MMAP queue and a USERPTR queue are created by a driver with the same queue lock.

Good point. Do you know any good solution to this problem?

http://patchwork.linuxtv.org/patch/8455/

It seems QBUF is safe, but PREPAREBUF isn't (both call __buf_prepare, which end up calling the memops get_userptr operation).

I'll post a patch to fix it for PREPAREBUF. If I'm not mistaken, you can drop the down_read/up_read here.

...

...
...
vma = find_vma(current->mm, start);

if (!vma) {
printk(KERN_ERR "no vma for address %lu\n", start);
goto cleanup;
}

if (vma_is_io(vma)) {
unsigned long pfn;
if (vma->vm_end - start < n_pages * PAGE_SIZE) {
	printk(KERN_ERR "vma is too small\n");
	goto cleanup;
}
for (n = 0; n < n_pages; ++n, start += PAGE_SIZE) {
	ret = follow_pfn(vma, start, &pfn);
	if (ret) {
		printk(KERN_ERR "no page for address %lu\n",
			start);
		goto cleanup;
	}
	pages[n] = pfn_to_page(pfn);
	get_page(pages[n]);
This worries me. When the VM_PFNMAP flag is set, the memory pages are not backed by a struct page. Creating a struct page pointer out of it can be an acceptable hack (for instance to store a page in an scatterlist with sg_set_page() and then retrieve its physical address with sg_phys()), but you should not expect the struct page to be valid for anything else. Calling get_page() on it will likely crash.
You are completetly right. This is the corner case where list of pages is not a portable way of describing the memory. Maybe pfn_valid should be used to check validity of the page (pfn) before getting it?

I think you should just drop the get_page() call. There's no page, so there's no need to get a reference count to it.

The VM_PFNMAP flag is mostly used with memory out of the kernel allocator's control if I'm not mistaken. The main use case I've seen is memory reserved at boot time and use as a frame buffer for instance. In that case the pages can't go away, as there no page in the first place.

This won't fix the DMA SG problem though (see below).

...

...
...
}
} else {
n = get_user_pages(current, current->mm, start & PAGE_MASK,
	n_pages, write, 1, pages, NULL);
if (n != n_pages) {
	printk(KERN_ERR "got only %d of %d user pages\n",
		n, n_pages);
	goto cleanup;
}
}

*copy_vma = vb2_get_vma(vma);

if (!*copy_vma) {
printk(KERN_ERR "failed to copy vma\n");
ret = -ENOMEM;
goto cleanup;
}
Do we really need to make a copy of the VMA ? The only reason why we store a pointer to it is to check the flags in vb2_dc_put_userptr(). We could store the flags instead and avoid vb2_get_dma()/vb2_put_dma() calls altogether.
I remember that there was a very good reason of copying this vma structure. You caught me on 'cargo-cult' programming. I will do some reverse engineering and try to answer it soon.

OK :-) I'm not copying the VMA in the OMAP3 ISP driver, which is why this caught my eyes. If you find the reason why copying it is needed, please add a comment to the code.

...

...
...
/* leaving critical section for mm access */

up_read(&current->mm->mmap_sem);

return 0;

+cleanup:
up_read(&current->mm->mmap_sem);

/* putting user pages if used, can be done wothout the lock */

while (n)
put_page(pages[--n]);
return ret;
+}

static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
			unsigned long size, int write)
unsigned long size, int write)

{ struct vb2_dc_buf *buf;

struct vm_area_struct *vma;

dma_addr_t dma_addr = 0;

int ret;

unsigned long start, end, offset, offset2;

struct page **pages;

int n_pages;

int ret = 0;

struct sg_table *sgt;

unsigned long contig_size;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf)

return ERR_PTR(-ENOMEM);

ret = vb2_get_contig_userptr(vaddr, size, &vma, &dma_addr);
buf->dev = alloc_ctx;

buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;

start = (unsigned long)vaddr & PAGE_MASK;

offset = (unsigned long)vaddr & ~PAGE_MASK;

end = PAGE_ALIGN((unsigned long)vaddr + size);

offset2 = end - (unsigned long)vaddr - size;

n_pages = (end - start) >> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
ret = -ENOMEM;
printk(KERN_ERR "failed to allocate pages table\n");
goto fail_buf;
}

/* extract page list from userspace mapping */

ret = vb2_dc_get_pages(start, pages, n_pages, &buf->vma, write);

if (ret) {
printk(KERN_ERR "Failed acquiring VMA for vaddr 0x%08lx\n",
		vaddr);
kfree(buf);
return ERR_PTR(ret);
printk(KERN_ERR "failed to get user pages\n");
goto fail_pages;
}

sgt = vb2_dc_pages_to_sgt(pages, n_pages, offset, offset2);

if (!sgt) {
printk(KERN_ERR "failed to create scatterlist table\n");
ret = -ENOMEM;
goto fail_get_pages;
}
This looks overly complex to me. You create a multi-chunk sgt out of the user pointer address and map it completely, and then check if it starts with a big enough contiguous chunk.
Notice that vb2_dc_pages_to_sgt does compress contiguous ranges of pfns (pages). So if the memory is contiguous, then sigle chunk sglist is produced. The memory used to store pages list is just temporary. It is freed after the sglist is created.

That's exactly my point. The memory needs to be contiguous to be usable. If it isn't, vb2-dma-contig will only use the first contiguous chunk. We could thus simplify the code by hardcoding the single-chunk assumption. vb2-dma-contig would walk user user pages list (or the PFN, depending on the VMA flags) and stop at the first discontinuity. It would then create a single-entry sg list and operate on that, without mapping or otherwise touching the rest of the VMA, which is unusable to the device anyway.

...

...
Why don't you create an sgt with a single continuous chunk then ? In the VM_PFNMAP case you could check whether the area is contiguous when you follow the PFNs, stop at the first discontinuity, and create an sgt with a single element right there. You would then need to call vb2_dc_pages_to_sgt() in the normal case only, and stop at the first discontinuity as well.

Discontinuity of pfns is not a problem if a device has own IOMMU. It is not known for vb2-dma-contig if mapping this multi-chunk sglist will succeed until calling and checking a result of dma_map_sg.

If the device has an IOMMU it won't need contiguous memory. Shouldn't it then use vb2-dma-sg instead ?

...

Why bothering if both VM_PFNMAP and non-VM_PFNMAP are handled in the same way after list of pages is obtained? Trating them the same way allows to reuse code and simplify the program flow.

The DMA framework does not provide any way to force single chunk mapping in sg. If the device is capable of mapping discontinous pages into a single chunk the DMA framework will probably do merge the pages. The documentation encourages to merge the list but it is not obligatory.

The reason is that if 'struct scatterlist' contains no dma_length field, what is controlled by CONFIG_NEED_SG_DMA_LENGTH macro, then field length is used instead. Chunk marging cannot be done in such a case. This is the reason why I look for the longest contiguous block.

...
...
/* pages are no longer needed */

kfree(pages);

pages = NULL;

sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents,
buf->dma_dir);
if (sgt->nents <= 0) {
printk(KERN_ERR "failed to map scatterlist\n");
ret = -EIO;
goto fail_sgt;
}

contig_size = vb2_dc_get_contiguous_size(sgt);

if (contig_size < size) {
printk(KERN_ERR "contiguous mapping is too small %lu/%lu\n",
	contig_size, size);
ret = -EFAULT;
goto fail_map_sg;
}

buf->dma_addr = sg_dma_address(sgt->sgl);

buf->size = size;
buf->dma_addr = dma_addr;

buf->vma = vma;

buf->dma_sgt = sgt;

atomic_inc(&buf->refcount);

return buf;

+fail_map_sg:

dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
I think this will break in the VM_PFNMAP case on non-coherent architectures. arm_dma_unmap_page() will call __dma_page_dev_to_cpu() in that case, which can dereference struct page. As explain above, the struct page isn't valid with VM_PFNMAP. I haven't check the dma_map_sg() and dma_sync_sg_*() calls, but changes are they might break as well.
It will crash as long it is true that there is no struct page behind given pfn. In practice, I found that VM_PFNMAP means that one cannot assume that there is a 'struct page' behind PFNs of a given mapping. Thoses struct pages really exists for all our drivers. Anyway, I agree that using those pages is a hack.

They don't exist for the memory used as frame buffer on the OMAP3 (or at least didn't exist in the N900 and N9, I haven't checked since). This could become just a bad distant memory when drivers will use CMA.

...

It could be avoided if vb2_dc_get_pages returned a list of PFNs. Anyway, those PFNs have to be transformed to pages to create an sglist. Those pointers might be accessed somewhere deep inside dma_map_sg internals.

The quite good solution would be dropping support for VM_PFNMAP mappings since they cannot be handled reliably.

We should either drop VM_PFNMAP support or fix the DMA SG mapping API to support VM_PFNMAP-style memory. I would vote for the former, as that's way simpler and we have no VM_PFNMAP use case right now.

...

...
...
+fail_sgt:

vb2_dc_put_sgtable(sgt, 0);

+fail_get_pages:
while (pages && n_pages)
put_page(pages[--n_pages]);
vb2_put_vma(buf->vma);
+fail_pages:

kfree(pages); /* kfree is NULL-proof */

+fail_buf:

kfree(buf);

return ERR_PTR(ret);

}

-- Regards, Laurent Pinchart

Subash Patel

2:52 p.m.

New subject: [RFCv2 PATCH 2/9 - 4/4] v4l: vb2-dma-contig: update and code refactoring

Hi Laurent,

On 03/22/2012 08:12 PM, Laurent Pinchart wrote:

...

Hi Tomasz,

On Thursday 22 March 2012 14:36:33 Tomasz Stanislawski wrote:

...
Hi Laurent, Thank you very much for your comments and question. They were very useful.

You're welcome.

...
Please refer to the comments below.

On 03/22/2012 11:50 AM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 11:02:23 Laurent Pinchart wrote:

...
From: Tomasz Stanislawskit.stanislaws@samsung.com

This patch combines updates and fixes to dma-contig allocator. Moreover the allocator code was refactored. The most important changes are:

functions were reordered

move compression of scatterlist to separete function

add support for multichunk but contiguous scatterlists

simplified implementation of vb2-dma-contig context structure

let mmap method to use dma_mmap_writecombine

add support for scatterlist in userptr mode

[snip]

...
...
...
diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index c898e6f..9965465 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c

[snip]

...
...
...
static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) {

struct device *dev = alloc_ctx; struct vb2_dc_buf *buf;

int ret;

int

n_pages;http://165.213.219.115/cgi-bin/gitweb.cgi?p=mirror/linux-3.0-mid as;a=shortlog;h=refs/heads/exynos-3.0-dev + struct page **pages = NULL;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

buf->vaddr = dma_alloc_coherent(dev, size,&buf->dma_addr,

GFP_KERNEL);

...
...
...
buf->dev = dev;

buf->size = size;

buf->vaddr = dma_alloc_coherent(buf->dev, buf->size,&buf->dma_addr,
GFP_KERNEL);
ret = -ENOMEM;

if (!buf->vaddr) {
dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
kfree(buf);
return ERR_PTR(-ENOMEM);
dev_err(dev, "dma_alloc_coherent of size %ld failed\n",
	size);
goto fail_buf;
}
buf->dev = dev;

buf->size = size;
WARN_ON((unsigned long)buf->vaddr& ~PAGE_MASK);

WARN_ON(buf->dma_addr& ~PAGE_MASK);

n_pages = PAGE_ALIGN(size)>> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
printk(KERN_ERR "failed to alloc page table\n");
goto fail_dma;
}

ret = dma_get_pages(dev, buf->vaddr, buf->dma_addr, pages, n_pages);
As the only purpose of this is to retrieve a list of pages that will be used to create a single-entry sgt, wouldn't it be possible to shortcut the code and get the physical address of the buffer directly ?
The physical address should not be used since they are meaningless in a context of different devices. It seams that only the list of pages is more-or-less portable between different drivers.
The pages are physically contiguous. The physical address of the first page is thus all you need.

struct page and physical addresses can be used interchangeably in this case if I'm not mistaken. If you want to go with pages, you could use the first page only instead of the physical buffer address.

...
The physical address is already present in buf->dma_addr, but it is only valid if the device has no MMU. Notice that vb2-dma-contig possess no knowledge if MMU is present for a given device.

That's why buf->dma_addr can't be considered as a physical address. It's only useful in the device context.

...
The sg list is not going to be single-entry if the device is provided with its own MMU.

There's something I don't get then. vb2-dma-contig deals with physically contiguous buffers. The buffer is backed by physically contiguous pages, so the sg list should have a single entry.

I think at present, vb2-dma-contig is abused for any kind of memory allocation (continuous or not). Wouldnt it be good to have a proper working setup for videobuf2-dma-sg instead? Driver which chooses to use continuous, shall assign vb2_queue->mem_ops = vb2_dma_contig_memops. The devices which know they have a MMU backing can assign the same to vb2_dma_sg_memops. But as of now, we try to use vb2_dma_contig_memops for all kind of operation. I have also done this mistake, and wish I repaired it and posted it before :(

...

...
...
...
if (ret< 0) {
printk(KERN_ERR "failed to get buffer pages from DMA API\n");
goto fail_pages;
}

if (ret != n_pages) {
ret = -EFAULT;
printk(KERN_ERR "failed to get all pages from DMA API\n");
goto fail_pages;
}

buf->sgt_base = vb2_dc_pages_to_sgt(pages, n_pages, 0, 0);

if (IS_ERR(buf->sgt_base)) {
ret = PTR_ERR(buf->sgt_base);
printk(KERN_ERR "failed to prepare sg table\n");
goto fail_pages;
}
buf->sgt_base isn't used in this patch. I would move the buf->sgt_base creation code to the patch that uses it then, or to its own patch just before the patch that uses it.
Good point. The sgt_base is used by exporter only. Thanks for noticing it.

...
...

/* pages are no longer needed */

kfree(pages);

buf->handler.refcount =&buf->refcount; buf->handler.put = vb2_dc_put;
[snip]

...
...
...
/*********************************************/ /* callbacks for USERPTR buffers */ /*********************************************/

+static inline int vma_is_io(struct vm_area_struct *vma) +{

return !!(vma->vm_flags& (VM_IO | VM_PFNMAP));

Isn't VM_PFNMAP enough ? Wouldn't it be possible (at least in theory) to get a discontinuous physical range with VM_IO ?

Frankly, I found that that in get_user_pages flags are checked against (VM_IO | VM_PFNMAP). Probably for noMMU (not no IOMMU) case it is possible to get vma with VM_IO on and VM_PFNMAP off, isn't it?

The problem is that this framework should work in both cases so this check was added just in case :).

OK. We can leave it here and deal with problems if they arise :-)

...
...
...
+}

+static int vb2_dc_get_pages(unsigned long start, struct page **pages,

int n_pages, struct vm_area_struct **copy_vma, int write)

+{

struct vm_area_struct *vma;

int n = 0; /* number of get pages */

int ret = -EFAULT;

/* entering critical section for mm access */

down_read(&current->mm->mmap_sem);

This will generate AB-BA deadlock warnings if lockdep is enabled. This function is called with the queue lock held, and the mmap() handler which takes the queue lock is called with current->mm->mmap_sem held.

This is a known issue with videobuf2, not specific to this patch. The warning is usually a false positive (which we still need to fix, as it worries users), but can become a real issue if an MMAP queue and a USERPTR queue are created by a driver with the same queue lock.

Good point. Do you know any good solution to this problem?

http://patchwork.linuxtv.org/patch/8455/

It seems QBUF is safe, but PREPAREBUF isn't (both call __buf_prepare, which end up calling the memops get_userptr operation).

I'll post a patch to fix it for PREPAREBUF. If I'm not mistaken, you can drop the down_read/up_read here.

...
...
...
vma = find_vma(current->mm, start);

if (!vma) {
printk(KERN_ERR "no vma for address %lu\n", start);
goto cleanup;
}

if (vma_is_io(vma)) {
unsigned long pfn;
if (vma->vm_end - start<  n_pages * PAGE_SIZE) {
	printk(KERN_ERR "vma is too small\n");
	goto cleanup;
}
for (n = 0; n<  n_pages; ++n, start += PAGE_SIZE) {
	ret = follow_pfn(vma, start,&pfn);
	if (ret) {
		printk(KERN_ERR "no page for address %lu\n",
			start);
		goto cleanup;
	}
	pages[n] = pfn_to_page(pfn);
	get_page(pages[n]);
This worries me. When the VM_PFNMAP flag is set, the memory pages are not backed by a struct page. Creating a struct page pointer out of it can be an acceptable hack (for instance to store a page in an scatterlist with sg_set_page() and then retrieve its physical address with sg_phys()), but you should not expect the struct page to be valid for anything else. Calling get_page() on it will likely crash.
You are completetly right. This is the corner case where list of pages is not a portable way of describing the memory. Maybe pfn_valid should be used to check validity of the page (pfn) before getting it?
I think you should just drop the get_page() call. There's no page, so there's no need to get a reference count to it.

The VM_PFNMAP flag is mostly used with memory out of the kernel allocator's control if I'm not mistaken. The main use case I've seen is memory reserved at boot time and use as a frame buffer for instance. In that case the pages can't go away, as there no page in the first place.

This won't fix the DMA SG problem though (see below).

...
...
...
}
} else {
n = get_user_pages(current, current->mm, start&  PAGE_MASK,
	n_pages, write, 1, pages, NULL);
if (n != n_pages) {
	printk(KERN_ERR "got only %d of %d user pages\n",
		n, n_pages);
	goto cleanup;
}
}

*copy_vma = vb2_get_vma(vma);

if (!*copy_vma) {
printk(KERN_ERR "failed to copy vma\n");
ret = -ENOMEM;
goto cleanup;
}
Do we really need to make a copy of the VMA ? The only reason why we store a pointer to it is to check the flags in vb2_dc_put_userptr(). We could store the flags instead and avoid vb2_get_dma()/vb2_put_dma() calls altogether.
I remember that there was a very good reason of copying this vma structure. You caught me on 'cargo-cult' programming. I will do some reverse engineering and try to answer it soon.
OK :-) I'm not copying the VMA in the OMAP3 ISP driver, which is why this caught my eyes. If you find the reason why copying it is needed, please add a comment to the code.

...
...
...
/* leaving critical section for mm access */

up_read(&current->mm->mmap_sem);

return 0;

+cleanup:
up_read(&current->mm->mmap_sem);

/* putting user pages if used, can be done wothout the lock */

while (n)
put_page(pages[--n]);
return ret;
+}

static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
			unsigned long size, int write)
unsigned long size, int write) {

struct vb2_dc_buf *buf;

struct vm_area_struct *vma;

dma_addr_t dma_addr = 0;

int ret;

unsigned long start, end, offset, offset2;

struct page **pages;

int n_pages;

int ret = 0;

struct sg_table *sgt;

unsigned long contig_size;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf)

return ERR_PTR(-ENOMEM);

ret = vb2_get_contig_userptr(vaddr, size,&vma,&dma_addr);
buf->dev = alloc_ctx;

buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;

start = (unsigned long)vaddr& PAGE_MASK;

offset = (unsigned long)vaddr& ~PAGE_MASK;

end = PAGE_ALIGN((unsigned long)vaddr + size);

offset2 = end - (unsigned long)vaddr - size;

n_pages = (end - start)>> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
ret = -ENOMEM;
printk(KERN_ERR "failed to allocate pages table\n");
goto fail_buf;
}

/* extract page list from userspace mapping */

ret = vb2_dc_get_pages(start, pages, n_pages,&buf->vma, write);

if (ret) {
printk(KERN_ERR "Failed acquiring VMA for vaddr 0x%08lx\n",
		vaddr);
kfree(buf);
return ERR_PTR(ret);
printk(KERN_ERR "failed to get user pages\n");
goto fail_pages;
}

sgt = vb2_dc_pages_to_sgt(pages, n_pages, offset, offset2);

if (!sgt) {
printk(KERN_ERR "failed to create scatterlist table\n");
ret = -ENOMEM;
goto fail_get_pages;
}
This looks overly complex to me. You create a multi-chunk sgt out of the user pointer address and map it completely, and then check if it starts with a big enough contiguous chunk.
Notice that vb2_dc_pages_to_sgt does compress contiguous ranges of pfns (pages). So if the memory is contiguous, then sigle chunk sglist is produced. The memory used to store pages list is just temporary. It is freed after the sglist is created.
That's exactly my point. The memory needs to be contiguous to be usable. If it isn't, vb2-dma-contig will only use the first contiguous chunk. We could thus simplify the code by hardcoding the single-chunk assumption. vb2-dma-contig would walk user user pages list (or the PFN, depending on the VMA flags) and stop at the first discontinuity. It would then create a single-entry sg list and operate on that, without mapping or otherwise touching the rest of the VMA, which is unusable to the device anyway.

...
...
Why don't you create an sgt with a single continuous chunk then ? In the VM_PFNMAP case you could check whether the area is contiguous when you follow the PFNs, stop at the first discontinuity, and create an sgt with a single element right there. You would then need to call vb2_dc_pages_to_sgt() in the normal case only, and stop at the first discontinuity as well.

Discontinuity of pfns is not a problem if a device has own IOMMU. It is not known for vb2-dma-contig if mapping this multi-chunk sglist will succeed until calling and checking a result of dma_map_sg.

If the device has an IOMMU it won't need contiguous memory. Shouldn't it then use vb2-dma-sg instead ?

...
Why bothering if both VM_PFNMAP and non-VM_PFNMAP are handled in the same way after list of pages is obtained? Trating them the same way allows to reuse code and simplify the program flow.

The DMA framework does not provide any way to force single chunk mapping in sg. If the device is capable of mapping discontinous pages into a single chunk the DMA framework will probably do merge the pages. The documentation encourages to merge the list but it is not obligatory.

The reason is that if 'struct scatterlist' contains no dma_length field, what is controlled by CONFIG_NEED_SG_DMA_LENGTH macro, then field length is used instead. Chunk marging cannot be done in such a case. This is the reason why I look for the longest contiguous block.

...
...
/* pages are no longer needed */

kfree(pages);

pages = NULL;

sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents,
buf->dma_dir);
if (sgt->nents<= 0) {
printk(KERN_ERR "failed to map scatterlist\n");
ret = -EIO;
goto fail_sgt;
}

contig_size = vb2_dc_get_contiguous_size(sgt);

if (contig_size< size) {
printk(KERN_ERR "contiguous mapping is too small %lu/%lu\n",
	contig_size, size);
ret = -EFAULT;
goto fail_map_sg;
}

buf->dma_addr = sg_dma_address(sgt->sgl);

buf->size = size;
buf->dma_addr = dma_addr;

buf->vma = vma;

buf->dma_sgt = sgt;

atomic_inc(&buf->refcount);

return buf;

+fail_map_sg:

dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
I think this will break in the VM_PFNMAP case on non-coherent architectures. arm_dma_unmap_page() will call __dma_page_dev_to_cpu() in that case, which can dereference struct page. As explain above, the struct page isn't valid with VM_PFNMAP. I haven't check the dma_map_sg() and dma_sync_sg_*() calls, but changes are they might break as well.
It will crash as long it is true that there is no struct page behind given pfn. In practice, I found that VM_PFNMAP means that one cannot assume that there is a 'struct page' behind PFNs of a given mapping. Thoses struct pages really exists for all our drivers. Anyway, I agree that using those pages is a hack.
They don't exist for the memory used as frame buffer on the OMAP3 (or at least didn't exist in the N900 and N9, I haven't checked since). This could become just a bad distant memory when drivers will use CMA.

...
It could be avoided if vb2_dc_get_pages returned a list of PFNs. Anyway, those PFNs have to be transformed to pages to create an sglist. Those pointers might be accessed somewhere deep inside dma_map_sg internals.

The quite good solution would be dropping support for VM_PFNMAP mappings since they cannot be handled reliably.

We should either drop VM_PFNMAP support or fix the DMA SG mapping API to support VM_PFNMAP-style memory. I would vote for the former, as that's way simpler and we have no VM_PFNMAP use case right now.

...
...
...
+fail_sgt:

vb2_dc_put_sgtable(sgt, 0);

+fail_get_pages:
while (pages&& n_pages)
put_page(pages[--n_pages]);
vb2_put_vma(buf->vma);
+fail_pages:

kfree(pages); /* kfree is NULL-proof */

+fail_buf:

kfree(buf);

return ERR_PTR(ret);

}

Regards, Subash

Tomasz Stanislawski

4:18 p.m.

New subject: [RFCv2 PATCH 2/9 - 4/4] v4l: vb2-dma-contig: update and code refactoring

On 03/22/2012 03:52 PM, Subash Patel wrote:

...

Hi Laurent,

On 03/22/2012 08:12 PM, Laurent Pinchart wrote:

...
Hi Tomasz,

On Thursday 22 March 2012 14:36:33 Tomasz Stanislawski wrote:

...
Hi Laurent, Thank you very much for your comments and question. They were very useful.

You're welcome.

...
Please refer to the comments below.

On 03/22/2012 11:50 AM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 11:02:23 Laurent Pinchart wrote:

...
From: Tomasz Stanislawskit.stanislaws@samsung.com

[snip]

...

...
...
The physical address is already present in buf->dma_addr, but it is only valid if the device has no MMU. Notice that vb2-dma-contig possess no knowledge if MMU is present for a given device.

That's why buf->dma_addr can't be considered as a physical address. It's only useful in the device context.

...
The sg list is not going to be single-entry if the device is provided with its own MMU.

There's something I don't get then. vb2-dma-contig deals with physically contiguous buffers. The buffer is backed by physically contiguous pages, so the sg list should have a single entry.

I think at present, vb2-dma-contig is abused for any kind of memory allocation (continuous or not). Wouldnt it be good to have a proper working setup for videobuf2-dma-sg instead? Driver which chooses to use continuous, shall assign vb2_queue->mem_ops = vb2_dma_contig_memops. The devices which know they have a MMU backing can assign the same to vb2_dma_sg_memops. But as of now, we try to use vb2_dma_contig_memops for all kind of operation. I have also done this mistake, and wish I repaired it and posted it before :(

Hi Subash,

At first, I do not think that vb2-dma-contig is abused for any kind of allocation. It is used only for DMA coherent contiguous mapping what is very close to vb2-dma-contig's original purpose.

One thing has to be said loudly:

"The driver does not need to and should not know if IOMMU is present or not."

The DMA framework must know if device uses IOMMU or not. Reason: the memory is allocated, free, flushed, mapped only by DMA framework.

Usage of vb2-dma-contig or vb2-dma-sg depends only on the way how the memory is configured into the device. Most of embedded devices use only buffer's start address and buffer size (often indirectly specified by width/height and format). It mean that device needs a buffer that is contiguous for its DMA engine.

In such a case, the driver should ALWAYS use vb2-dma-contig.

The DMA framework hidden deep in dma_alloc_coherent internals will deal with IOMMU configuration.

I hope you find this information useful.

Regards, Tomasz Stanislawski

Tomasz Stanislawski

3:58 p.m.

New subject: [RFCv2 PATCH 2/9 - 4/4] v4l: vb2-dma-contig: update and code refactoring

Hi Laurent,

On 03/22/2012 03:42 PM, Laurent Pinchart wrote:

...

Hi Tomasz,

On Thursday 22 March 2012 14:36:33 Tomasz Stanislawski wrote:

...
Hi Laurent, Thank you very much for your comments and question. They were very useful.

You're welcome.

...
Please refer to the comments below.

On 03/22/2012 11:50 AM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 11:02:23 Laurent Pinchart wrote:

...
From: Tomasz Stanislawski t.stanislaws@samsung.com

This patch combines updates and fixes to dma-contig allocator. Moreover the allocator code was refactored. The most important changes are:

functions were reordered

move compression of scatterlist to separete function

add support for multichunk but contiguous scatterlists

simplified implementation of vb2-dma-contig context structure

let mmap method to use dma_mmap_writecombine

add support for scatterlist in userptr mode

[snip]

...
...
...
diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index c898e6f..9965465 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c

[snip]

...
...
...
static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) { struct device *dev = alloc_ctx; struct vb2_dc_buf *buf;

int ret;

int

n_pages;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr,

GFP_KERNEL);

...
...
...
buf->dev = dev;

buf->size = size;

buf->vaddr = dma_alloc_coherent(buf->dev, buf->size, &buf->dma_addr,
GFP_KERNEL);
ret = -ENOMEM;

if (!buf->vaddr) {
dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
kfree(buf);
return ERR_PTR(-ENOMEM);
dev_err(dev, "dma_alloc_coherent of size %ld failed\n",
	size);
goto fail_buf;
}
buf->dev = dev;

buf->size = size;
WARN_ON((unsigned long)buf->vaddr & ~PAGE_MASK);

WARN_ON(buf->dma_addr & ~PAGE_MASK);

n_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
printk(KERN_ERR "failed to alloc page table\n");
goto fail_dma;
}

ret = dma_get_pages(dev, buf->vaddr, buf->dma_addr, pages, n_pages);
As the only purpose of this is to retrieve a list of pages that will be used to create a single-entry sgt, wouldn't it be possible to shortcut the code and get the physical address of the buffer directly ?
The physical address should not be used since they are meaningless in a context of different devices. It seams that only the list of pages is more-or-less portable between different drivers.
The pages are physically contiguous. The physical address of the first page is thus all you need.

No. DMA-CONTIG buffers do not have to be physically contiguous. Please refer below.

...

struct page and physical addresses can be used interchangeably in this case if I'm not mistaken. If you want to go with pages, you could use the first page only instead of the physical buffer address.

Ok. There are bus addresses, physical addresses, DMA addresses and PFNs. As I understand PFNs and 'struct page' can be interchanged, at least in one direction. The PFNs are used to create a bus address, I mean addresses that are recognized by a RAM chip. So a list of PFNs seams to be the most portable way of describing the memory, isn't it?

...

...
The physical address is already present in buf->dma_addr, but it is only valid if the device has no MMU. Notice that vb2-dma-contig possess no knowledge if MMU is present for a given device.

That's why buf->dma_addr can't be considered as a physical address. It's only useful in the device context.

...

...
The sg list is not going to be single-entry if the device is provided with its own MMU.

There's something I don't get then. vb2-dma-contig deals with physically contiguous buffers. The buffer is backed by physically contiguous pages, so the sg list should have a single entry.

As I understand dma-contig deal with DMA contiguous buffers, it means buffers that are contiguous from device point of view. Therefore those buffers do NOT have to be physically contiguous if the device has its own IOMMU.

...

...
...
...
if (ret < 0) {
printk(KERN_ERR "failed to get buffer pages from DMA API\n");
goto fail_pages;
}

if (ret != n_pages) {
ret = -EFAULT;
printk(KERN_ERR "failed to get all pages from DMA API\n");
goto fail_pages;
}

buf->sgt_base = vb2_dc_pages_to_sgt(pages, n_pages, 0, 0);

if (IS_ERR(buf->sgt_base)) {
ret = PTR_ERR(buf->sgt_base);
printk(KERN_ERR "failed to prepare sg table\n");
goto fail_pages;
}
buf->sgt_base isn't used in this patch. I would move the buf->sgt_base creation code to the patch that uses it then, or to its own patch just before the patch that uses it.
Good point. The sgt_base is used by exporter only. Thanks for noticing it.

...
...

/* pages are no longer needed */

kfree(pages);

buf->handler.refcount = &buf->refcount; buf->handler.put = vb2_dc_put;
[snip]

...
...
...
/*********************************************/ /* callbacks for USERPTR buffers */ /*********************************************/

+static inline int vma_is_io(struct vm_area_struct *vma) +{

return !!(vma->vm_flags & (VM_IO | VM_PFNMAP));

Isn't VM_PFNMAP enough ? Wouldn't it be possible (at least in theory) to get a discontinuous physical range with VM_IO ?

Frankly, I found that that in get_user_pages flags are checked against (VM_IO | VM_PFNMAP). Probably for noMMU (not no IOMMU) case it is possible to get vma with VM_IO on and VM_PFNMAP off, isn't it?

The problem is that this framework should work in both cases so this check was added just in case :).

OK. We can leave it here and deal with problems if they arise :-)

...
...
...
+}

+static int vb2_dc_get_pages(unsigned long start, struct page **pages,

int n_pages, struct vm_area_struct **copy_vma, int write)

+{

struct vm_area_struct *vma;

int n = 0; /* number of get pages */

int ret = -EFAULT;

/* entering critical section for mm access */

down_read(&current->mm->mmap_sem);

This will generate AB-BA deadlock warnings if lockdep is enabled. This function is called with the queue lock held, and the mmap() handler which takes the queue lock is called with current->mm->mmap_sem held.

This is a known issue with videobuf2, not specific to this patch. The warning is usually a false positive (which we still need to fix, as it worries users), but can become a real issue if an MMAP queue and a USERPTR queue are created by a driver with the same queue lock.

Good point. Do you know any good solution to this problem?

http://patchwork.linuxtv.org/patch/8455/

It seems QBUF is safe, but PREPAREBUF isn't (both call __buf_prepare, which end up calling the memops get_userptr operation).

I'll post a patch to fix it for PREPAREBUF. If I'm not mistaken, you can drop the down_read/up_read here.

ok. Thanks for the link.

...

...
...
...
vma = find_vma(current->mm, start);

if (!vma) {
printk(KERN_ERR "no vma for address %lu\n", start);
goto cleanup;
}

if (vma_is_io(vma)) {
unsigned long pfn;
if (vma->vm_end - start < n_pages * PAGE_SIZE) {
	printk(KERN_ERR "vma is too small\n");
	goto cleanup;
}
for (n = 0; n < n_pages; ++n, start += PAGE_SIZE) {
	ret = follow_pfn(vma, start, &pfn);
	if (ret) {
		printk(KERN_ERR "no page for address %lu\n",
			start);
		goto cleanup;
	}
	pages[n] = pfn_to_page(pfn);
	get_page(pages[n]);
This worries me. When the VM_PFNMAP flag is set, the memory pages are not backed by a struct page. Creating a struct page pointer out of it can be an acceptable hack (for instance to store a page in an scatterlist with sg_set_page() and then retrieve its physical address with sg_phys()), but you should not expect the struct page to be valid for anything else. Calling get_page() on it will likely crash.
You are completetly right. This is the corner case where list of pages is not a portable way of describing the memory. Maybe pfn_valid should be used to check validity of the page (pfn) before getting it?
I think you should just drop the get_page() call. There's no page, so there's no need to get a reference count to it.

The problem is that get_user_pages does call get_page. Not calling get_page will break the symmetry between PFNMAP and non-PFNMAP buffers. Maybe checking page validity before get_page/put_page is enough?

...

The VM_PFNMAP flag is mostly used with memory out of the kernel allocator's control if I'm not mistaken. The main use case I've seen is memory reserved at boot time and use as a frame buffer for instance. In that case the pages can't go away, as there no page in the first place.

This won't fix the DMA SG problem though (see below).

...
...
...
}
} else {
n = get_user_pages(current, current->mm, start & PAGE_MASK,
	n_pages, write, 1, pages, NULL);
if (n != n_pages) {
	printk(KERN_ERR "got only %d of %d user pages\n",
		n, n_pages);
	goto cleanup;
}
}

*copy_vma = vb2_get_vma(vma);

if (!*copy_vma) {
printk(KERN_ERR "failed to copy vma\n");
ret = -ENOMEM;
goto cleanup;
}
Do we really need to make a copy of the VMA ? The only reason why we store a pointer to it is to check the flags in vb2_dc_put_userptr(). We could store the flags instead and avoid vb2_get_dma()/vb2_put_dma() calls altogether.
I remember that there was a very good reason of copying this vma structure. You caught me on 'cargo-cult' programming. I will do some reverse engineering and try to answer it soon.
OK :-) I'm not copying the VMA in the OMAP3 ISP driver, which is why this caught my eyes. If you find the reason why copying it is needed, please add a comment to the code.

The reason of copying vma was that 'struct vma' has no reference counters. Therefore it could be deleted after mm lock is freed, ending with freeing its all pages belonging to vma. To prevent it, a copy of vma is created. Notice that inside vb2_get_vma the callback open is called for original vma, preventing memory from being released. On vb2_put_vma the complementary close is called.

...

...
...
...
/* leaving critical section for mm access */

up_read(&current->mm->mmap_sem);

return 0;

+cleanup:
up_read(&current->mm->mmap_sem);

/* putting user pages if used, can be done wothout the lock */

while (n)
put_page(pages[--n]);
return ret;
+}

static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
			unsigned long size, int write)
unsigned long size, int write)

{ struct vb2_dc_buf *buf;

struct vm_area_struct *vma;

dma_addr_t dma_addr = 0;

int ret;

unsigned long start, end, offset, offset2;

struct page **pages;

int n_pages;

int ret = 0;

struct sg_table *sgt;

unsigned long contig_size;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf)

return ERR_PTR(-ENOMEM);

ret = vb2_get_contig_userptr(vaddr, size, &vma, &dma_addr);
buf->dev = alloc_ctx;

buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;

start = (unsigned long)vaddr & PAGE_MASK;

offset = (unsigned long)vaddr & ~PAGE_MASK;

end = PAGE_ALIGN((unsigned long)vaddr + size);

offset2 = end - (unsigned long)vaddr - size;

n_pages = (end - start) >> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
ret = -ENOMEM;
printk(KERN_ERR "failed to allocate pages table\n");
goto fail_buf;
}

/* extract page list from userspace mapping */

ret = vb2_dc_get_pages(start, pages, n_pages, &buf->vma, write);

if (ret) {
printk(KERN_ERR "Failed acquiring VMA for vaddr 0x%08lx\n",
		vaddr);
kfree(buf);
return ERR_PTR(ret);
printk(KERN_ERR "failed to get user pages\n");
goto fail_pages;
}

sgt = vb2_dc_pages_to_sgt(pages, n_pages, offset, offset2);

if (!sgt) {
printk(KERN_ERR "failed to create scatterlist table\n");
ret = -ENOMEM;
goto fail_get_pages;
}
This looks overly complex to me. You create a multi-chunk sgt out of the user pointer address and map it completely, and then check if it starts with a big enough contiguous chunk.
Notice that vb2_dc_pages_to_sgt does compress contiguous ranges of pfns (pages). So if the memory is contiguous, then sigle chunk sglist is produced. The memory used to store pages list is just temporary. It is freed after the sglist is created.
That's exactly my point. The memory needs to be contiguous to be usable. If it isn't, vb2-dma-contig will only use the first contiguous chunk. We could thus simplify the code by hardcoding the single-chunk assumption. vb2-dma-contig would walk user user pages list (or the PFN, depending on the VMA flags) and stop at the first discontinuity. It would then create a single-entry sg list and operate on that, without mapping or otherwise touching the rest of the VMA, which is unusable to the device anyway.

[see above]

...

...
...
Why don't you create an sgt with a single continuous chunk then ? In the VM_PFNMAP case you could check whether the area is contiguous when you follow the PFNs, stop at the first discontinuity, and create an sgt with a single element right there. You would then need to call vb2_dc_pages_to_sgt() in the normal case only, and stop at the first discontinuity as well.

Discontinuity of pfns is not a problem if a device has own IOMMU. It is not known for vb2-dma-contig if mapping this multi-chunk sglist will succeed until calling and checking a result of dma_map_sg.

If the device has an IOMMU it won't need contiguous memory. Shouldn't it then use vb2-dma-sg instead ?

No. The vb2-dma-sg is used for devices that can deal with memory accessed from multiple chunks. For example a device with attached DMA engine that can handle a scattergather list.

...

...
Why bothering if both VM_PFNMAP and non-VM_PFNMAP are handled in the same way after list of pages is obtained? Trating them the same way allows to reuse code and simplify the program flow.

The DMA framework does not provide any way to force single chunk mapping in sg. If the device is capable of mapping discontinous pages into a single chunk the DMA framework will probably do merge the pages. The documentation encourages to merge the list but it is not obligatory.

The reason is that if 'struct scatterlist' contains no dma_length field, what is controlled by CONFIG_NEED_SG_DMA_LENGTH macro, then field length is used instead. Chunk marging cannot be done in such a case. This is the reason why I look for the longest contiguous block.

...
...
/* pages are no longer needed */

kfree(pages);

pages = NULL;

sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents,
buf->dma_dir);
if (sgt->nents <= 0) {
printk(KERN_ERR "failed to map scatterlist\n");
ret = -EIO;
goto fail_sgt;
}

contig_size = vb2_dc_get_contiguous_size(sgt);

if (contig_size < size) {
printk(KERN_ERR "contiguous mapping is too small %lu/%lu\n",
	contig_size, size);
ret = -EFAULT;
goto fail_map_sg;
}

buf->dma_addr = sg_dma_address(sgt->sgl);

buf->size = size;
buf->dma_addr = dma_addr;

buf->vma = vma;

buf->dma_sgt = sgt;

atomic_inc(&buf->refcount);

return buf;

+fail_map_sg:

dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
I think this will break in the VM_PFNMAP case on non-coherent architectures. arm_dma_unmap_page() will call __dma_page_dev_to_cpu() in that case, which can dereference struct page. As explain above, the struct page isn't valid with VM_PFNMAP. I haven't check the dma_map_sg() and dma_sync_sg_*() calls, but changes are they might break as well.
It will crash as long it is true that there is no struct page behind given pfn. In practice, I found that VM_PFNMAP means that one cannot assume that there is a 'struct page' behind PFNs of a given mapping. Thoses struct pages really exists for all our drivers. Anyway, I agree that using those pages is a hack.
They don't exist for the memory used as frame buffer on the OMAP3 (or at least didn't exist in the N900 and N9, I haven't checked since). This could become just a bad distant memory when drivers will use CMA.

I see. That is why one should check page validity before calling get_pages. Other way would be splitting logic between PFNMAP and non-PFNMAP userspace mappings.

...

...
It could be avoided if vb2_dc_get_pages returned a list of PFNs. Anyway, those PFNs have to be transformed to pages to create an sglist. Those pointers might be accessed somewhere deep inside dma_map_sg internals.

The quite good solution would be dropping support for VM_PFNMAP mappings since they cannot be handled reliably.

We should either drop VM_PFNMAP support or fix the DMA SG mapping API to support VM_PFNMAP-style memory. I would vote for the former, as that's way simpler and we have no VM_PFNMAP use case right now.

Good point. But this way we have new dependencies on DMA mapping framework.

...

...
...
...
+fail_sgt:

vb2_dc_put_sgtable(sgt, 0);

+fail_get_pages:
while (pages && n_pages)
put_page(pages[--n_pages]);
vb2_put_vma(buf->vma);
+fail_pages:

kfree(pages); /* kfree is NULL-proof */

+fail_buf:

kfree(buf);

return ERR_PTR(ret);

}

Regards, Tomasz Stanislawski

Laurent Pinchart

27 Mar 27 Mar

3:01 p.m.

New subject: [RFCv2 PATCH 2/9 - 4/4] v4l: vb2-dma-contig: update and code refactoring

Hi Tomasz,

On Thursday 22 March 2012 16:58:27 Tomasz Stanislawski wrote:

...

On 03/22/2012 03:42 PM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 14:36:33 Tomasz Stanislawski wrote:

...
On 03/22/2012 11:50 AM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 11:02:23 Laurent Pinchart wrote:

[snip]

...

...
...
...
...
static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) { struct device *dev = alloc_ctx; struct vb2_dc_buf *buf;

int ret;

int

n_pages;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);

buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr,

GFP_KERNEL);

...
...
...
buf->dev = dev;

buf->size = size;

buf->vaddr = dma_alloc_coherent(buf->dev, buf->size, &buf->dma_addr,
GFP_KERNEL);
ret = -ENOMEM;

if (!buf->vaddr) {
dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
kfree(buf);
return ERR_PTR(-ENOMEM);
dev_err(dev, "dma_alloc_coherent of size %ld failed\n",
	size);
goto fail_buf;
}
buf->dev = dev;

buf->size = size;
WARN_ON((unsigned long)buf->vaddr & ~PAGE_MASK);

WARN_ON(buf->dma_addr & ~PAGE_MASK);

n_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
printk(KERN_ERR "failed to alloc page table\n");
goto fail_dma;
}

ret = dma_get_pages(dev, buf->vaddr, buf->dma_addr, pages, n_pages);
As the only purpose of this is to retrieve a list of pages that will be used to create a single-entry sgt, wouldn't it be possible to shortcut the code and get the physical address of the buffer directly ?
The physical address should not be used since they are meaningless in a context of different devices. It seams that only the list of pages is more-or-less portable between different drivers.
The pages are physically contiguous. The physical address of the first page is thus all you need.
No. DMA-CONTIG buffers do not have to be physically contiguous. Please refer below.

...
struct page and physical addresses can be used interchangeably in this case if I'm not mistaken. If you want to go with pages, you could use the first page only instead of the physical buffer address.

Ok. There are bus addresses, physical addresses, DMA addresses and PFNs. As I understand PFNs and 'struct page' can be interchanged, at least in one direction. The PFNs are used to create a bus address, I mean addresses that are recognized by a RAM chip. So a list of PFNs seams to be the most portable way of describing the memory, isn't it?

...
...
The physical address is already present in buf->dma_addr, but it is only valid if the device has no MMU. Notice that vb2-dma-contig possess no knowledge if MMU is present for a given device.

That's why buf->dma_addr can't be considered as a physical address. It's only useful in the device context.

ok

...
...
The sg list is not going to be single-entry if the device is provided with its own MMU.

There's something I don't get then. vb2-dma-contig deals with physically contiguous buffers. The buffer is backed by physically contiguous pages, so the sg list should have a single entry.

As I understand dma-contig deal with DMA contiguous buffers, it means buffers that are contiguous from device point of view. Therefore those buffers do NOT have to be physically contiguous if the device has its own IOMMU.

My bad. There was thus a misunderstanding to begin with.

In the light of this new information, and (at least partially) sharing Daniel's opinion regarding dma_get_pages(), I think what we need here would be either

- a DMA API call that maps the memory to the importer device instead of dma_get_pages() + vb2_dc_pages_to_sgt(). The call would take a DMA memory "cookie" (see the "Minutes from V4L2 update call" mail thread) and a pointer to the importer device.

- a DMA API call to retrieve a scatter list suitable to be passed to dma_map_sg(). This would be similar to dma_get_pages() + vb2_dc_pages_to_sgt().

We also need to figure out whether the mapping call should be in the exporter or importer.

...

...
...
...
...
if (ret < 0) {
printk(KERN_ERR "failed to get buffer pages from DMA API\n");
goto fail_pages;
}

if (ret != n_pages) {
ret = -EFAULT;
printk(KERN_ERR "failed to get all pages from DMA API\n");
goto fail_pages;
}

buf->sgt_base = vb2_dc_pages_to_sgt(pages, n_pages, 0, 0);

if (IS_ERR(buf->sgt_base)) {
ret = PTR_ERR(buf->sgt_base);
printk(KERN_ERR "failed to prepare sg table\n");
goto fail_pages;
}
buf->sgt_base isn't used in this patch. I would move the buf->sgt_base creation code to the patch that uses it then, or to its own patch just before the patch that uses it.
Good point. The sgt_base is used by exporter only. Thanks for noticing it.

...
...

/* pages are no longer needed */

kfree(pages);

buf->handler.refcount = &buf->refcount; buf->handler.put = vb2_dc_put;
[snip]

...
...
...
/*********************************************/ /* callbacks for USERPTR buffers */ /*********************************************/

+static inline int vma_is_io(struct vm_area_struct *vma) +{

return !!(vma->vm_flags & (VM_IO | VM_PFNMAP));

Isn't VM_PFNMAP enough ? Wouldn't it be possible (at least in theory) to get a discontinuous physical range with VM_IO ?

Frankly, I found that that in get_user_pages flags are checked against (VM_IO | VM_PFNMAP). Probably for noMMU (not no IOMMU) case it is possible to get vma with VM_IO on and VM_PFNMAP off, isn't it?

The problem is that this framework should work in both cases so this check was added just in case :).

OK. We can leave it here and deal with problems if they arise :-)

...
...
...
+}

+static int vb2_dc_get_pages(unsigned long start, struct page **pages,

int n_pages, struct vm_area_struct **copy_vma, int write)

+{

struct vm_area_struct *vma;

int n = 0; /* number of get pages */

int ret = -EFAULT;

/* entering critical section for mm access */

down_read(&current->mm->mmap_sem);

This will generate AB-BA deadlock warnings if lockdep is enabled. This function is called with the queue lock held, and the mmap() handler which takes the queue lock is called with current->mm->mmap_sem held.

This is a known issue with videobuf2, not specific to this patch. The warning is usually a false positive (which we still need to fix, as it worries users), but can become a real issue if an MMAP queue and a USERPTR queue are created by a driver with the same queue lock.

Good point. Do you know any good solution to this problem?

http://patchwork.linuxtv.org/patch/8455/

It seems QBUF is safe, but PREPAREBUF isn't (both call __buf_prepare, which end up calling the memops get_userptr operation).

I'll post a patch to fix it for PREPAREBUF. If I'm not mistaken, you can drop the down_read/up_read here.
ok. Thanks for the link.

...
...
...
...
vma = find_vma(current->mm, start);

if (!vma) {
printk(KERN_ERR "no vma for address %lu\n", start);
goto cleanup;
}

if (vma_is_io(vma)) {
unsigned long pfn;
if (vma->vm_end - start < n_pages * PAGE_SIZE) {
	printk(KERN_ERR "vma is too small\n");
	goto cleanup;
}
for (n = 0; n < n_pages; ++n, start += PAGE_SIZE) {
	ret = follow_pfn(vma, start, &pfn);
	if (ret) {
		printk(KERN_ERR "no page for address %lu\n",
			start);
		goto cleanup;
	}
	pages[n] = pfn_to_page(pfn);
	get_page(pages[n]);
This worries me. When the VM_PFNMAP flag is set, the memory pages are not backed by a struct page. Creating a struct page pointer out of it can be an acceptable hack (for instance to store a page in an scatterlist with sg_set_page() and then retrieve its physical address with sg_phys()), but you should not expect the struct page to be valid for anything else. Calling get_page() on it will likely crash.
You are completetly right. This is the corner case where list of pages is not a portable way of describing the memory. Maybe pfn_valid should be used to check validity of the page (pfn) before getting it?
I think you should just drop the get_page() call. There's no page, so there's no need to get a reference count to it.
The problem is that get_user_pages does call get_page. Not calling get_page will break the symmetry between PFNMAP and non-PFNMAP buffers. Maybe checking page validity before get_page/put_page is enough?

PFNMAP and non-PFNMAP buffers are inherently different, so I don't see a problem in handling them differently. We will likely run into an issue though, with hardware such as the OMAP TILER, where the memory isn't backed by normal memory (and thus no struct page is present), but for which the target must be pinned somehow (in the case of the tiler, that would be a tiler mapping). I don't think we have an API to ask the kernel to pin a memory range regardless of how the memory is handled (system memory, reserved memory with PFNMAP, device mapping such as with the tiler, ...). This is an open issue. One possible solution is to deprecate USERPTR support for that kind of memory and use dma-buf instead.

...

...
The VM_PFNMAP flag is mostly used with memory out of the kernel allocator's control if I'm not mistaken. The main use case I've seen is memory reserved at boot time and use as a frame buffer for instance. In that case the pages can't go away, as there no page in the first place.

This won't fix the DMA SG problem though (see below).

...
...
...
}
} else {
n = get_user_pages(current, current->mm, start & PAGE_MASK,
	n_pages, write, 1, pages, NULL);
if (n != n_pages) {
	printk(KERN_ERR "got only %d of %d user pages\n",
		n, n_pages);
	goto cleanup;
}
}

*copy_vma = vb2_get_vma(vma);

if (!*copy_vma) {
printk(KERN_ERR "failed to copy vma\n");
ret = -ENOMEM;
goto cleanup;
}
Do we really need to make a copy of the VMA ? The only reason why we store a pointer to it is to check the flags in vb2_dc_put_userptr(). We could store the flags instead and avoid vb2_get_dma()/vb2_put_dma() calls altogether.
I remember that there was a very good reason of copying this vma structure. You caught me on 'cargo-cult' programming. I will do some reverse engineering and try to answer it soon.
OK :-) I'm not copying the VMA in the OMAP3 ISP driver, which is why this caught my eyes. If you find the reason why copying it is needed, please add a comment to the code.
The reason of copying vma was that 'struct vma' has no reference counters. Therefore it could be deleted after mm lock is freed, ending with freeing its all pages belonging to vma. To prevent it, a copy of vma is created. Notice that inside vb2_get_vma the callback open is called for original vma, preventing memory from being released. On vb2_put_vma the complementary close is called.

Feel free to prove me wrong, but I think get_user_pages() is enough to prevent the pages from being freed, even if the VMA is deleted.

However, there's one subtle issue that we will need to deal with when we will implement cache management. It took me a lot of time to debug and fix it when I was working on the OMAP3 ISP driver, so I'll explain it in the hope that someone will find it later when dealing with the same problems :-)

When a buffer is queued, the OMAP3 ISP driver cleans up the cache using the userspace mapping addresses (for USERPTR buffers). This might be a bad thing, but that's the way we currently implement that.

A prior get_user_pages() call will make sure pages are not freed, but due to optimization in the lockless memory management algorithms the userspace mapping can disappear: the kernel might consider that a page can be freed, remove its userspace mapping, and then find out that the page is locked. It will then move on without restoring the userspace mapping, which will be restored when the next page fault occurs.

When cleaning the cache using the userspace mapping addresses, any page for which the userspace mapping has been removed will trigger a page fault. The page fault handler (do_page_fault() in arm/arch/mm/fault.c) will try to read_lock mmap_sem. If it fails, it will check if the page fault occured in userspace context, or from a known exception location. As neither condition is true, it will panic.

The solution I found to this issue was to lock the VMA. This ensured that the userspace mapping would stay in place. See isp_video_buffer_lock_vma() in drivers/media/video/omap3isp/ispqueue.c. You could use a similar approach here if you want to ensure that userspace mappings are not removed, but once again I don't think that's needed (until we get to cache handling) as get_user_pages() will ensure that the pages are not freed.

...

...
...
...
...
/* leaving critical section for mm access */

up_read(&current->mm->mmap_sem);

return 0;

+cleanup:
up_read(&current->mm->mmap_sem);

/* putting user pages if used, can be done wothout the lock */

while (n)
put_page(pages[--n]);
return ret;
+}

static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
			unsigned long size, int write)
unsigned long size, int write)

{ struct vb2_dc_buf *buf;

struct vm_area_struct *vma;

dma_addr_t dma_addr = 0;

int ret;

unsigned long start, end, offset, offset2;

struct page **pages;

int n_pages;

int ret = 0;

struct sg_table *sgt;

unsigned long contig_size;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf)

return ERR_PTR(-ENOMEM);

ret = vb2_get_contig_userptr(vaddr, size, &vma, &dma_addr);
buf->dev = alloc_ctx;

buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;

start = (unsigned long)vaddr & PAGE_MASK;

offset = (unsigned long)vaddr & ~PAGE_MASK;

end = PAGE_ALIGN((unsigned long)vaddr + size);

offset2 = end - (unsigned long)vaddr - size;

n_pages = (end - start) >> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
ret = -ENOMEM;
printk(KERN_ERR "failed to allocate pages table\n");
goto fail_buf;
}

/* extract page list from userspace mapping */

ret = vb2_dc_get_pages(start, pages, n_pages, &buf->vma, write);

if (ret) {
printk(KERN_ERR "Failed acquiring VMA for vaddr 0x%08lx\n",
		vaddr);
kfree(buf);
return ERR_PTR(ret);
printk(KERN_ERR "failed to get user pages\n");
goto fail_pages;
}

sgt = vb2_dc_pages_to_sgt(pages, n_pages, offset, offset2);

if (!sgt) {
printk(KERN_ERR "failed to create scatterlist table\n");
ret = -ENOMEM;
goto fail_get_pages;
}
This looks overly complex to me. You create a multi-chunk sgt out of the user pointer address and map it completely, and then check if it starts with a big enough contiguous chunk.
Notice that vb2_dc_pages_to_sgt does compress contiguous ranges of pfns (pages). So if the memory is contiguous, then sigle chunk sglist is produced. The memory used to store pages list is just temporary. It is freed after the sglist is created.
That's exactly my point. The memory needs to be contiguous to be usable. If it isn't, vb2-dma-contig will only use the first contiguous chunk. We could thus simplify the code by hardcoding the single-chunk assumption. vb2-dma-contig would walk user user pages list (or the PFN, depending on the VMA flags) and stop at the first discontinuity. It would then create a single-entry sg list and operate on that, without mapping or otherwise touching the rest of the VMA, which is unusable to the device anyway.
[see above]

...
...
...
Why don't you create an sgt with a single continuous chunk then ? In the VM_PFNMAP case you could check whether the area is contiguous when you follow the PFNs, stop at the first discontinuity, and create an sgt with a single element right there. You would then need to call vb2_dc_pages_to_sgt() in the normal case only, and stop at the first discontinuity as well.

Discontinuity of pfns is not a problem if a device has own IOMMU. It is not known for vb2-dma-contig if mapping this multi-chunk sglist will succeed until calling and checking a result of dma_map_sg.

If the device has an IOMMU it won't need contiguous memory. Shouldn't it then use vb2-dma-sg instead ?

No. The vb2-dma-sg is used for devices that can deal with memory accessed from multiple chunks. For example a device with attached DMA engine that can handle a scattergather list.

...
...
Why bothering if both VM_PFNMAP and non-VM_PFNMAP are handled in the same way after list of pages is obtained? Trating them the same way allows to reuse code and simplify the program flow.

The DMA framework does not provide any way to force single chunk mapping in sg. If the device is capable of mapping discontinous pages into a single chunk the DMA framework will probably do merge the pages. The documentation encourages to merge the list but it is not obligatory.

The reason is that if 'struct scatterlist' contains no dma_length field, what is controlled by CONFIG_NEED_SG_DMA_LENGTH macro, then field length is used instead. Chunk marging cannot be done in such a case. This is the reason why I look for the longest contiguous block.

...
...
/* pages are no longer needed */

kfree(pages);

pages = NULL;

sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents,
buf->dma_dir);
if (sgt->nents <= 0) {
printk(KERN_ERR "failed to map scatterlist\n");
ret = -EIO;
goto fail_sgt;
}

contig_size = vb2_dc_get_contiguous_size(sgt);

if (contig_size < size) {
printk(KERN_ERR "contiguous mapping is too small %lu/%lu\n",
	contig_size, size);
ret = -EFAULT;
goto fail_map_sg;
}

buf->dma_addr = sg_dma_address(sgt->sgl);

buf->size = size;
buf->dma_addr = dma_addr;

buf->vma = vma;

buf->dma_sgt = sgt;

atomic_inc(&buf->refcount);

return buf;

+fail_map_sg:

dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
I think this will break in the VM_PFNMAP case on non-coherent architectures. arm_dma_unmap_page() will call __dma_page_dev_to_cpu() in that case, which can dereference struct page. As explain above, the struct page isn't valid with VM_PFNMAP. I haven't check the dma_map_sg() and dma_sync_sg_*() calls, but changes are they might break as well.
It will crash as long it is true that there is no struct page behind given pfn. In practice, I found that VM_PFNMAP means that one cannot assume that there is a 'struct page' behind PFNs of a given mapping. Thoses struct pages really exists for all our drivers. Anyway, I agree that using those pages is a hack.
They don't exist for the memory used as frame buffer on the OMAP3 (or at least didn't exist in the N900 and N9, I haven't checked since). This could become just a bad distant memory when drivers will use CMA.
I see. That is why one should check page validity before calling get_pages. Other way would be splitting logic between PFNMAP and non-PFNMAP userspace mappings.

The OMAP3 ISP driver splits the logic for the PFNMAP and non-PFNMAP cases and the code is quite clean (at least in my opinion ;-)). I can give you a quick tour around its allocator if you want. I would go for that solution here as well.

...

...
...
It could be avoided if vb2_dc_get_pages returned a list of PFNs. Anyway, those PFNs have to be transformed to pages to create an sglist. Those pointers might be accessed somewhere deep inside dma_map_sg internals.

The quite good solution would be dropping support for VM_PFNMAP mappings since they cannot be handled reliably.

We should either drop VM_PFNMAP support or fix the DMA SG mapping API to support VM_PFNMAP-style memory. I would vote for the former, as that's way simpler and we have no VM_PFNMAP use case right now.

Good point. But this way we have new dependencies on DMA mapping framework.

...
...
...
...
+fail_sgt:

vb2_dc_put_sgtable(sgt, 0);

+fail_get_pages:
while (pages && n_pages)
put_page(pages[--n_pages]);
vb2_put_vma(buf->vma);
+fail_pages:

kfree(pages); /* kfree is NULL-proof */

+fail_buf:

kfree(buf);

return ERR_PTR(ret);

}

-- Regards, Laurent Pinchart

Jerome Glisse

4:45 p.m.

New subject: [RFCv2 PATCH 2/9 - 4/4] v4l: vb2-dma-contig: update and code refactoring

On Tue, Mar 27, 2012 at 11:01 AM, Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:

...

Hi Tomasz,

On Thursday 22 March 2012 16:58:27 Tomasz Stanislawski wrote:

...
On 03/22/2012 03:42 PM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 14:36:33 Tomasz Stanislawski wrote:

...
On 03/22/2012 11:50 AM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 11:02:23 Laurent Pinchart wrote:

[snip]

...
...
...
...
...
static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) {

struct device *dev = alloc_ctx; struct vb2_dc_buf *buf;

int ret;

int

n_pages;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf)

return ERR_PTR(-ENOMEM);

buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr,

GFP_KERNEL);

...
...
...

buf->dev = dev;

buf->size = size;

buf->vaddr = dma_alloc_coherent(buf->dev, buf->size, &buf->dma_addr,

GFP_KERNEL);

ret = -ENOMEM;

if (!buf->vaddr) {

dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);

kfree(buf);

return ERR_PTR(-ENOMEM);

dev_err(dev, "dma_alloc_coherent of size %ld failed\n",

size);

goto fail_buf;

}

buf->dev = dev;

buf->size = size;

WARN_ON((unsigned long)buf->vaddr & ~PAGE_MASK);

WARN_ON(buf->dma_addr & ~PAGE_MASK);

n_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {

printk(KERN_ERR "failed to alloc page table\n");

goto fail_dma;

}

ret = dma_get_pages(dev, buf->vaddr, buf->dma_addr, pages, n_pages);

As the only purpose of this is to retrieve a list of pages that will be used to create a single-entry sgt, wouldn't it be possible to shortcut the code and get the physical address of the buffer directly ?

The physical address should not be used since they are meaningless in a context of different devices. It seams that only the list of pages is more-or-less portable between different drivers.

The pages are physically contiguous. The physical address of the first page is thus all you need.

No. DMA-CONTIG buffers do not have to be physically contiguous. Please refer below.

...
struct page and physical addresses can be used interchangeably in this case if I'm not mistaken. If you want to go with pages, you could use the first page only instead of the physical buffer address.

Ok. There are bus addresses, physical addresses, DMA addresses and PFNs. As I understand PFNs and 'struct page' can be interchanged, at least in one direction. The PFNs are used to create a bus address, I mean addresses that are recognized by a RAM chip. So a list of PFNs seams to be the most portable way of describing the memory, isn't it?

...
...
The physical address is already present in buf->dma_addr, but it is only valid if the device has no MMU. Notice that vb2-dma-contig possess no knowledge if MMU is present for a given device.

That's why buf->dma_addr can't be considered as a physical address. It's only useful in the device context.

ok

...
...
The sg list is not going to be single-entry if the device is provided with its own MMU.

There's something I don't get then. vb2-dma-contig deals with physically contiguous buffers. The buffer is backed by physically contiguous pages, so the sg list should have a single entry.

As I understand dma-contig deal with DMA contiguous buffers, it means buffers that are contiguous from device point of view. Therefore those buffers do NOT have to be physically contiguous if the device has its own IOMMU.

My bad. There was thus a misunderstanding to begin with.

In the light of this new information, and (at least partially) sharing Daniel's opinion regarding dma_get_pages(), I think what we need here would be either

a DMA API call that maps the memory to the importer device instead of

dma_get_pages() + vb2_dc_pages_to_sgt(). The call would take a DMA memory "cookie" (see the "Minutes from V4L2 update call" mail thread) and a pointer to the importer device.

a DMA API call to retrieve a scatter list suitable to be passed to

dma_map_sg(). This would be similar to dma_get_pages() + vb2_dc_pages_to_sgt().

We also need to figure out whether the mapping call should be in the exporter or importer.

...
...
...
...
...

if (ret < 0) {

printk(KERN_ERR "failed to get buffer pages from DMA API\n");

goto fail_pages;

}

if (ret != n_pages) {

ret = -EFAULT;

printk(KERN_ERR "failed to get all pages from DMA API\n");

goto fail_pages;

}

buf->sgt_base = vb2_dc_pages_to_sgt(pages, n_pages, 0, 0);

if (IS_ERR(buf->sgt_base)) {

ret = PTR_ERR(buf->sgt_base);

printk(KERN_ERR "failed to prepare sg table\n");

goto fail_pages;

}

buf->sgt_base isn't used in this patch. I would move the buf->sgt_base creation code to the patch that uses it then, or to its own patch just before the patch that uses it.

Good point. The sgt_base is used by exporter only. Thanks for noticing it.

...
...

/* pages are no longer needed */

kfree(pages);

buf->handler.refcount = &buf->refcount; buf->handler.put = vb2_dc_put;

[snip]

...
...
...
/*********************************************/ /* callbacks for USERPTR buffers */ /*********************************************/

+static inline int vma_is_io(struct vm_area_struct *vma) +{

return !!(vma->vm_flags & (VM_IO | VM_PFNMAP));

Isn't VM_PFNMAP enough ? Wouldn't it be possible (at least in theory) to get a discontinuous physical range with VM_IO ?

Frankly, I found that that in get_user_pages flags are checked against (VM_IO | VM_PFNMAP). Probably for noMMU (not no IOMMU) case it is possible to get vma with VM_IO on and VM_PFNMAP off, isn't it?

The problem is that this framework should work in both cases so this check was added just in case :).

OK. We can leave it here and deal with problems if they arise :-)

...
...
...
+}

+static int vb2_dc_get_pages(unsigned long start, struct page **pages,

int n_pages, struct vm_area_struct **copy_vma, int write)

+{

struct vm_area_struct *vma;

int n = 0; /* number of get pages */

int ret = -EFAULT;

/* entering critical section for mm access */

down_read(&current->mm->mmap_sem);

This will generate AB-BA deadlock warnings if lockdep is enabled. This function is called with the queue lock held, and the mmap() handler which takes the queue lock is called with current->mm->mmap_sem held.

This is a known issue with videobuf2, not specific to this patch. The warning is usually a false positive (which we still need to fix, as it worries users), but can become a real issue if an MMAP queue and a USERPTR queue are created by a driver with the same queue lock.

Good point. Do you know any good solution to this problem?

http://patchwork.linuxtv.org/patch/8455/

It seems QBUF is safe, but PREPAREBUF isn't (both call __buf_prepare, which end up calling the memops get_userptr operation).

I'll post a patch to fix it for PREPAREBUF. If I'm not mistaken, you can drop the down_read/up_read here.

ok. Thanks for the link.

...
...
...
...

vma = find_vma(current->mm, start);

if (!vma) {

printk(KERN_ERR "no vma for address %lu\n", start);

goto cleanup;

}

if (vma_is_io(vma)) {

unsigned long pfn;

if (vma->vm_end - start < n_pages * PAGE_SIZE) {

printk(KERN_ERR "vma is too small\n");

goto cleanup;

}

for (n = 0; n < n_pages; ++n, start += PAGE_SIZE) {

ret = follow_pfn(vma, start, &pfn);

if (ret) {

printk(KERN_ERR "no page for address %lu\n",

start);

goto cleanup;

}

pages[n] = pfn_to_page(pfn);

get_page(pages[n]);

This worries me. When the VM_PFNMAP flag is set, the memory pages are not backed by a struct page. Creating a struct page pointer out of it can be an acceptable hack (for instance to store a page in an scatterlist with sg_set_page() and then retrieve its physical address with sg_phys()), but you should not expect the struct page to be valid for anything else. Calling get_page() on it will likely crash.

You are completetly right. This is the corner case where list of pages is not a portable way of describing the memory. Maybe pfn_valid should be used to check validity of the page (pfn) before getting it?

I think you should just drop the get_page() call. There's no page, so there's no need to get a reference count to it.

The problem is that get_user_pages does call get_page. Not calling get_page will break the symmetry between PFNMAP and non-PFNMAP buffers. Maybe checking page validity before get_page/put_page is enough?

PFNMAP and non-PFNMAP buffers are inherently different, so I don't see a problem in handling them differently. We will likely run into an issue though, with hardware such as the OMAP TILER, where the memory isn't backed by normal memory (and thus no struct page is present), but for which the target must be pinned somehow (in the case of the tiler, that would be a tiler mapping). I don't think we have an API to ask the kernel to pin a memory range regardless of how the memory is handled (system memory, reserved memory with PFNMAP, device mapping such as with the tiler, ...). This is an open issue. One possible solution is to deprecate USERPTR support for that kind of memory and use dma-buf instead.

...
...
The VM_PFNMAP flag is mostly used with memory out of the kernel allocator's control if I'm not mistaken. The main use case I've seen is memory reserved at boot time and use as a frame buffer for instance. In that case the pages can't go away, as there no page in the first place.

This won't fix the DMA SG problem though (see below).

...
...
...

}

} else {

n = get_user_pages(current, current->mm, start & PAGE_MASK,

n_pages, write, 1, pages, NULL);

if (n != n_pages) {

printk(KERN_ERR "got only %d of %d user pages\n",

n, n_pages);

goto cleanup;

}

}

*copy_vma = vb2_get_vma(vma);

if (!*copy_vma) {

printk(KERN_ERR "failed to copy vma\n");

ret = -ENOMEM;

goto cleanup;

}

Do we really need to make a copy of the VMA ? The only reason why we store a pointer to it is to check the flags in vb2_dc_put_userptr(). We could store the flags instead and avoid vb2_get_dma()/vb2_put_dma() calls altogether.

I remember that there was a very good reason of copying this vma structure. You caught me on 'cargo-cult' programming. I will do some reverse engineering and try to answer it soon.

OK :-) I'm not copying the VMA in the OMAP3 ISP driver, which is why this caught my eyes. If you find the reason why copying it is needed, please add a comment to the code.

The reason of copying vma was that 'struct vma' has no reference counters. Therefore it could be deleted after mm lock is freed, ending with freeing its all pages belonging to vma. To prevent it, a copy of vma is created. Notice that inside vb2_get_vma the callback open is called for original vma, preventing memory from being released. On vb2_put_vma the complementary close is called.

Feel free to prove me wrong, but I think get_user_pages() is enough to prevent the pages from being freed, even if the VMA is deleted.

However, there's one subtle issue that we will need to deal with when we will implement cache management. It took me a lot of time to debug and fix it when I was working on the OMAP3 ISP driver, so I'll explain it in the hope that someone will find it later when dealing with the same problems :-)

When a buffer is queued, the OMAP3 ISP driver cleans up the cache using the userspace mapping addresses (for USERPTR buffers). This might be a bad thing, but that's the way we currently implement that.

A prior get_user_pages() call will make sure pages are not freed, but due to optimization in the lockless memory management algorithms the userspace mapping can disappear: the kernel might consider that a page can be freed, remove its userspace mapping, and then find out that the page is locked. It will then move on without restoring the userspace mapping, which will be restored when the next page fault occurs.

When cleaning the cache using the userspace mapping addresses, any page for which the userspace mapping has been removed will trigger a page fault. The page fault handler (do_page_fault() in arm/arch/mm/fault.c) will try to read_lock mmap_sem. If it fails, it will check if the page fault occured in userspace context, or from a known exception location. As neither condition is true, it will panic.

The solution I found to this issue was to lock the VMA. This ensured that the userspace mapping would stay in place. See isp_video_buffer_lock_vma() in drivers/media/video/omap3isp/ispqueue.c. You could use a similar approach here if you want to ensure that userspace mappings are not removed, but once again I don't think that's needed (until we get to cache handling) as get_user_pages() will ensure that the pages are not freed.

I think the proper solution is to not use any user allocated memory and always use dma-buf. Some evil process thread might unlock the vma behind your back and you back to the original issue.

The linux memory management is not designed to easily allow use of user allocated memory by a device to do dma to/from it, at least not for the usecase where dma operation might happen over long period of time.

I guess some VMA change might help but this might also be hurt full and i am not familiar enough with the whole memory management to venture a guess on what kind if implication there is.

Cheers, Jerome Glisse

Laurent Pinchart

4:58 p.m.

New subject: [RFCv2 PATCH 2/9 - 4/4] v4l: vb2-dma-contig: update and code refactoring

Hi Jerome,

On Tuesday 27 March 2012 12:45:23 Jerome Glisse wrote:

...

On Tue, Mar 27, 2012 at 11:01 AM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 16:58:27 Tomasz Stanislawski wrote:

...
On 03/22/2012 03:42 PM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 14:36:33 Tomasz Stanislawski wrote:

...
On 03/22/2012 11:50 AM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 11:02:23 Laurent Pinchart wrote:

[snip]

...

...
...
...
...
...
> + *copy_vma = vb2_get_vma(vma); > + if (!*copy_vma) { > + printk(KERN_ERR "failed to copy vma\n"); > + ret = -ENOMEM; > + goto cleanup; > + }

Do we really need to make a copy of the VMA ? The only reason why we store a pointer to it is to check the flags in vb2_dc_put_userptr(). We could store the flags instead and avoid vb2_get_dma()/vb2_put_dma() calls altogether.

I remember that there was a very good reason of copying this vma structure. You caught me on 'cargo-cult' programming. I will do some reverse engineering and try to answer it soon.

OK :-) I'm not copying the VMA in the OMAP3 ISP driver, which is why this caught my eyes. If you find the reason why copying it is needed, please add a comment to the code.

The reason of copying vma was that 'struct vma' has no reference counters. Therefore it could be deleted after mm lock is freed, ending with freeing its all pages belonging to vma. To prevent it, a copy of vma is created. Notice that inside vb2_get_vma the callback open is called for original vma, preventing memory from being released. On vb2_put_vma the complementary close is called.

Feel free to prove me wrong, but I think get_user_pages() is enough to prevent the pages from being freed, even if the VMA is deleted.

However, there's one subtle issue that we will need to deal with when we will implement cache management. It took me a lot of time to debug and fix it when I was working on the OMAP3 ISP driver, so I'll explain it in the hope that someone will find it later when dealing with the same problems :-)

When a buffer is queued, the OMAP3 ISP driver cleans up the cache using the userspace mapping addresses (for USERPTR buffers). This might be a bad thing, but that's the way we currently implement that.

A prior get_user_pages() call will make sure pages are not freed, but due to optimization in the lockless memory management algorithms the userspace mapping can disappear: the kernel might consider that a page can be freed, remove its userspace mapping, and then find out that the page is locked. It will then move on without restoring the userspace mapping, which will be restored when the next page fault occurs.

When cleaning the cache using the userspace mapping addresses, any page for which the userspace mapping has been removed will trigger a page fault. The page fault handler (do_page_fault() in arm/arch/mm/fault.c) will try to read_lock mmap_sem. If it fails, it will check if the page fault occured in userspace context, or from a known exception location. As neither condition is true, it will panic.

The solution I found to this issue was to lock the VMA. This ensured that the userspace mapping would stay in place. See isp_video_buffer_lock_vma() in drivers/media/video/omap3isp/ispqueue.c. You could use a similar approach here if you want to ensure that userspace mappings are not removed, but once again I don't think that's needed (until we get to cache handling) as get_user_pages() will ensure that the pages are not freed.

I think the proper solution is to not use any user allocated memory and always use dma-buf. Some evil process thread might unlock the vma behind your back and you back to the original issue.

The linux memory management is not designed to easily allow use of user allocated memory by a device to do dma to/from it, at least not for the usecase where dma operation might happen over long period of time.

I guess some VMA change might help but this might also be hurt full and i am not familiar enough with the whole memory management to venture a guess on what kind if implication there is.

I agree with you, we should move away from using user-allocated memory. I just wanted to explain what I did and why for reference purpose, as it took me lots of time to debug the issue. Until every driver moves to dma-buf (and for some time after as well) we will still need to support user-allocated memory in V4L2, even if the solution isn't completely bullet-proof.

-- Regards, Laurent Pinchart

Subash Patel

22 Mar 22 Mar

1:42 p.m.

New subject: [RFCv2 PATCH 2/9] v4l: vb2-dma-contig: update and code refactoring

Hello Thomasz, Marek,

I found an issue with below patch. Please read below.

On 03/13/2012 03:47 PM, Tomasz Stanislawski wrote:

...

This patch combines updates and fixes to dma-contig allocator. Moreover the allocator code was refactored. The most important changes are:

functions were reordered

move compression of scatterlist to separete function

add support for multichunk but contiguous scatterlists

simplified implementation of vb2-dma-contig context structure

let mmap method to use dma_mmap_writecombine

add support for scatterlist in userptr mode

Signed-off-by: Marek Szyprowskim.szyprowski@samsung.com [mmap method] Signed-off-by: Andrzej Pietrasiewiczandrzej.p@samsung.com [scatterlist in userptr mode] Signed-off-by: Kamil Debskik.debski@samsung.com [bugfixing] Signed-off-by: Tomasz Stanislawskit.stanislaws@samsung.com [core refactoring, helper functions] Signed-off-by: Kyungmin Parkkyungmin.park@samsung.com

drivers/media/video/videobuf2-dma-contig.c | 495 +++++++++++++++++++++++----- 1 files changed, 414 insertions(+), 81 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index f17ad98..c1dc043 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -10,173 +10,506 @@

the Free Software Foundation.

*/

+#include<linux/dma-buf.h> +#include<linux/dma-mapping.h> #include<linux/module.h> +#include<linux/scatterlist.h> +#include<linux/sched.h> #include<linux/slab.h> -#include<linux/dma-mapping.h>

#include<media/videobuf2-core.h> #include<media/videobuf2-memops.h>

-struct vb2_dc_conf {

struct device *dev;

-};

struct vb2_dc_buf {

struct vb2_dc_conf *conf;

struct device *dev; void *vaddr;

dma_addr_t dma_addr; unsigned long size;

struct vm_area_struct *vma;

atomic_t refcount;

dma_addr_t dma_addr;

struct sg_table *dma_sgt;

enum dma_data_direction dma_dir;

/* MMAP related */ struct vb2_vmarea_handler handler;

atomic_t refcount;

struct sg_table *sgt_base;

/* USERPTR related */

struct vm_area_struct *vma; };

-static void vb2_dma_contig_put(void *buf_priv); +/*********************************************/ +/* scatterlist table functions */ +/*********************************************/

-static void *vb2_dma_contig_alloc(void *alloc_ctx, unsigned long size) +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages,

unsigned long n_pages, size_t offset, size_t offset2) {

struct vb2_dc_conf *conf = alloc_ctx;

struct vb2_dc_buf *buf;

struct sg_table *sgt;

int i, j; /* loop counters */

int cur_page, chunks;

int ret;

struct scatterlist *s;

buf = kzalloc(sizeof *buf, GFP_KERNEL);

if (!buf)

sgt = kzalloc(sizeof *sgt, GFP_KERNEL);

if (!sgt) return ERR_PTR(-ENOMEM);
buf->vaddr = dma_alloc_coherent(conf->dev, size,&buf->dma_addr,
			GFP_KERNEL);
if (!buf->vaddr) {
dev_err(conf->dev, "dma_alloc_coherent of size %ld failed\n",
	size);
kfree(buf);
/* compute number of chunks */

chunks = 1;

for (i = 1; i< n_pages; ++i)
if (pages[i] != pages[i - 1] + 1)
	++chunks;
ret = sg_alloc_table(sgt, chunks, GFP_KERNEL);

if (ret) {
kfree(sgt);
return ERR_PTR(-ENOMEM); }
buf->conf = conf;

buf->size = size;

buf->handler.refcount =&buf->refcount;

buf->handler.put = vb2_dma_contig_put;

buf->handler.arg = buf;
/* merging chunks and putting them into the scatterlist */

cur_page = 0;

for_each_sg(sgt->sgl, s, sgt->orig_nents, i) {
size_t size = PAGE_SIZE;
for (j = cur_page + 1; j<  n_pages; ++j) {
	if (pages[j] != pages[j - 1] + 1)
		break;
	size += PAGE_SIZE;
}
/* cut offset if chunk starts at the first page */
if (cur_page == 0)
	size -= offset;
/* cut offset2 if chunk ends at the last page */
if (j == n_pages)
	size -= offset2;
sg_set_page(s, pages[cur_page], size, offset);
offset = 0;
cur_page = j;
}
atomic_inc(&buf->refcount);

return sgt;

+}

return buf;

+static void vb2_dc_release_sgtable(struct sg_table *sgt) +{

sg_free_table(sgt);

kfree(sgt); }

-static void vb2_dma_contig_put(void *buf_priv) +static void vb2_dc_put_sgtable(struct sg_table *sgt, int dirty) {

struct vb2_dc_buf *buf = buf_priv;
struct scatterlist *s;

int i, j;

for_each_sg(sgt->sgl, s, sgt->nents, i) {
struct page *page = sg_page(s);
int n_pages = PAGE_ALIGN(s->offset + s->length)>>  PAGE_SHIFT;
for (j = 0; j<  n_pages; ++j, ++page) {
	if (dirty)
		set_page_dirty_lock(page);
	put_page(page);
}
}

vb2_dc_release_sgtable(sgt);
+}
if (atomic_dec_and_test(&buf->refcount)) {
dma_free_coherent(buf->conf->dev, buf->size, buf->vaddr,
		  buf->dma_addr);
kfree(buf);
+static unsigned long vb2_dc_get_contiguous_size(struct sg_table *sgt) +{
struct scatterlist *s;

dma_addr_t expected = sg_dma_address(sgt->sgl);

int i;

unsigned long size = 0;

for_each_sg(sgt->sgl, s, sgt->nents, i) {
if (sg_dma_address(s) != expected)
	break;
expected = sg_dma_address(s) + sg_dma_len(s);
size += sg_dma_len(s);
}
return size; }
-static void *vb2_dma_contig_cookie(void *buf_priv) +/*********************************************/ +/* callbacks for all buffers */ +/*********************************************/

+static void *vb2_dc_cookie(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;

return&buf->dma_addr; }

-static void *vb2_dma_contig_vaddr(void *buf_priv) +static void *vb2_dc_vaddr(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;
if (!buf)
return 0;
return buf->vaddr; }
-static unsigned int vb2_dma_contig_num_users(void *buf_priv) +static unsigned int vb2_dc_num_users(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;

return atomic_read(&buf->refcount); }

-static int vb2_dma_contig_mmap(void *buf_priv, struct vm_area_struct *vma) +static void vb2_dc_prepare(void *buf_priv) { struct vb2_dc_buf *buf = buf_priv;

struct sg_table *sgt = buf->dma_sgt;
if (!buf) {
printk(KERN_ERR "No buffer to map\n");
return -EINVAL;
}
if (!sgt)
return;
dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
+}

+static void vb2_dc_finish(void *buf_priv) +{
struct vb2_dc_buf *buf = buf_priv;

struct sg_table *sgt = buf->dma_sgt;

if (!sgt)
return;
dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
+}

+/*********************************************/ +/* callbacks for MMAP buffers */ +/*********************************************/

+static void vb2_dc_put(void *buf_priv) +{

struct vb2_dc_buf *buf = buf_priv;
return vb2_mmap_pfn_range(vma, buf->dma_addr, buf->size,
		&vb2_common_vm_ops,&buf->handler);
if (!atomic_dec_and_test(&buf->refcount))
return;
vb2_dc_release_sgtable(buf->sgt_base);

dma_free_coherent(buf->dev, buf->size, buf->vaddr,
buf->dma_addr);
kfree(buf); }
-static void *vb2_dma_contig_get_userptr(void *alloc_ctx, unsigned long vaddr,
			unsigned long size, int write)
+static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size) {

struct device *dev = alloc_ctx; struct vb2_dc_buf *buf;

struct vm_area_struct *vma;

dma_addr_t dma_addr = 0; int ret;

int n_pages;

struct page **pages = NULL;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM);
ret = vb2_get_contig_userptr(vaddr, size,&vma,&dma_addr);

if (ret) {
printk(KERN_ERR "Failed acquiring VMA for vaddr 0x%08lx\n",
		vaddr);
kfree(buf);
return ERR_PTR(ret);
buf->dev = dev;

buf->size = size;

buf->vaddr = dma_alloc_coherent(buf->dev, buf->size,&buf->dma_addr,
GFP_KERNEL);

Memory allocation can happen from device coherent or alloc_pages(). Pls ref to arm_dma_alloc() in dma mapping v7 patchset. This info will be elaborated more below.

...

ret = -ENOMEM;

if (!buf->vaddr) {
dev_err(dev, "dma_alloc_coherent of size %ld failed\n",
	size);
goto fail_buf;
}
buf->size = size;

buf->dma_addr = dma_addr;

buf->vma = vma;
WARN_ON((unsigned long)buf->vaddr& ~PAGE_MASK);

WARN_ON(buf->dma_addr& ~PAGE_MASK);

n_pages = PAGE_ALIGN(size)>> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
printk(KERN_ERR "failed to alloc page table\n");
goto fail_dma;
}

ret = dma_get_pages(dev, buf->vaddr, buf->dma_addr, pages, n_pages);

if (ret< 0) {
printk(KERN_ERR "failed to get buffer pages from DMA API\n");
goto fail_pages;
}

if (ret != n_pages) {
ret = -EFAULT;
printk(KERN_ERR "failed to get all pages from DMA API\n");
goto fail_pages;
}

buf->sgt_base = vb2_dc_pages_to_sgt(pages, n_pages, 0, 0);

if (IS_ERR(buf->sgt_base)) {
ret = PTR_ERR(buf->sgt_base);
printk(KERN_ERR "failed to prepare sg table\n");
goto fail_pages;
}

/* pages are no longer needed */

kfree(pages);

buf->handler.refcount =&buf->refcount;

buf->handler.put = vb2_dc_put;

buf->handler.arg = buf;

atomic_inc(&buf->refcount);

return buf;
+fail_pages:

kfree(pages);

+fail_dma:

dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);

+fail_buf:

kfree(buf);

return ERR_PTR(ret);

+}

+static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma) +{
struct vb2_dc_buf *buf = buf_priv;

int ret;

/*
* dma_mmap_* uses vm_pgoff as in-buffer offset, but we want to
* map whole buffer
*/
vma->vm_pgoff = 0;

ret = dma_mmap_writecombine(buf->dev, vma, buf->vaddr,
buf->dma_addr, buf->size);

For devices, which do not have a coherent/reserved pool, then the allocation in function __dma_alloc() happens, and the memory will be remapped by calling __dma_alloc_remap(). In the above function, arm_vmregion_find() will be called, and it succeeds to get the map and it will be mapped to user.

This has issues with devices with coherent memory. If we have any device which has declared coherent memory, then the allocation happens from per-device coherent area. Eg: MFC which declares: s5p_mfc_reserve_mem(). In that case, dma_mmap_writecombine() fails, and hence vb2_dc_mmap() fails as well.

I will reply the same in the dma mapping thread as well, as I dont want to combine both these RFC's together.

...

if (ret) {
printk(KERN_ERR "Remapping memory failed, error: %d\n", ret);
return ret;
}

vma->vm_flags |= VM_DONTEXPAND | VM_RESERVED;

vma->vm_private_data =&buf->handler;

vma->vm_ops =&vb2_common_vm_ops;

vma->vm_ops->open(vma);

printk(KERN_DEBUG "%s: mapped dma addr 0x%08lx at 0x%08lx, size %ld\n",
__func__, (unsigned long)buf->dma_addr, vma->vm_start,
buf->size);
return 0; }
-static void vb2_dma_contig_put_userptr(void *mem_priv) +/*********************************************/ +/* callbacks for USERPTR buffers */ +/*********************************************/

+static inline int vma_is_io(struct vm_area_struct *vma) {

struct vb2_dc_buf *buf = mem_priv;

return !!(vma->vm_flags& (VM_IO | VM_PFNMAP));

+}

+static int vb2_dc_get_pages(unsigned long start, struct page **pages,

int n_pages, struct vm_area_struct **copy_vma, int write)

+{
struct vm_area_struct *vma;

int n = 0; /* number of get pages */

int ret = -EFAULT;

/* entering critical section for mm access */

down_read(&current->mm->mmap_sem);

vma = find_vma(current->mm, start);

if (!vma) {
printk(KERN_ERR "no vma for address %lu\n", start);
goto cleanup;
}

if (vma_is_io(vma)) {
unsigned long pfn;
if (vma->vm_end - start<  n_pages * PAGE_SIZE) {
	printk(KERN_ERR "vma is too small\n");
	goto cleanup;
}
for (n = 0; n<  n_pages; ++n, start += PAGE_SIZE) {
	ret = follow_pfn(vma, start,&pfn);
	if (ret) {
		printk(KERN_ERR "no page for address %lu\n",
			start);
		goto cleanup;
	}
	pages[n] = pfn_to_page(pfn);
	get_page(pages[n]);
}
} else {
n = get_user_pages(current, current->mm, start&  PAGE_MASK,
	n_pages, write, 1, pages, NULL);
if (n != n_pages) {
	printk(KERN_ERR "got only %d of %d user pages\n",
		n, n_pages);
	goto cleanup;
}
}

*copy_vma = vb2_get_vma(vma);

if (!*copy_vma) {
printk(KERN_ERR "failed to copy vma\n");
ret = -ENOMEM;
goto cleanup;
}

/* leaving critical section for mm access */

up_read(&current->mm->mmap_sem);

return 0;
+cleanup:
up_read(&current->mm->mmap_sem);

/* putting user pages if used, can be done wothout the lock */

while (n)
put_page(pages[--n]);
return ret;
+}

+static void vb2_dc_put_userptr(void *buf_priv) +{

struct vb2_dc_buf *buf = buf_priv;

struct sg_table *sgt = buf->dma_sgt;

dma_unmap_sg(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir);

vb2_dc_put_sgtable(sgt, !vma_is_io(buf->vma));

vb2_put_vma(buf->vma);

kfree(buf);

+}

+static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,

unsigned long size, int write)

+{

struct vb2_dc_buf *buf;

unsigned long start, end, offset, offset2;

struct page **pages;

int n_pages;

int ret = 0;

struct sg_table *sgt;

unsigned long contig_size;

buf = kzalloc(sizeof *buf, GFP_KERNEL); if (!buf)
return;
return ERR_PTR(-ENOMEM);
buf->dev = alloc_ctx;

buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;

start = (unsigned long)vaddr& PAGE_MASK;

offset = (unsigned long)vaddr& ~PAGE_MASK;

end = PAGE_ALIGN((unsigned long)vaddr + size);

offset2 = end - (unsigned long)vaddr - size;

n_pages = (end - start)>> PAGE_SHIFT;

pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);

if (!pages) {
ret = -ENOMEM;
printk(KERN_ERR "failed to allocate pages table\n");
goto fail_buf;
}

/* extract page list from userspace mapping */

ret = vb2_dc_get_pages(start, pages, n_pages,&buf->vma, write);

if (ret) {
printk(KERN_ERR "failed to get user pages\n");
goto fail_pages;
}

sgt = vb2_dc_pages_to_sgt(pages, n_pages, offset, offset2);

if (!sgt) {
printk(KERN_ERR "failed to create scatterlist table\n");
ret = -ENOMEM;
goto fail_get_pages;
}

/* pages are no longer needed */

kfree(pages);

pages = NULL;

sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents,
buf->dma_dir);
if (sgt->nents<= 0) {
printk(KERN_ERR "failed to map scatterlist\n");
ret = -EIO;
goto fail_sgt;
}

contig_size = vb2_dc_get_contiguous_size(sgt);

if (contig_size< size) {
printk(KERN_ERR "contiguous mapping is too small %lu/%lu\n",
	contig_size, size);
ret = -EFAULT;
goto fail_map_sg;
}

buf->dma_addr = sg_dma_address(sgt->sgl);

buf->size = size;

buf->dma_sgt = sgt;

atomic_inc(&buf->refcount);

return buf;
+fail_map_sg:

dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);

+fail_sgt:

vb2_dc_put_sgtable(sgt, 0);

+fail_get_pages:
while (pages&& n_pages)
put_page(pages[--n_pages]);
vb2_put_vma(buf->vma);
+fail_pages:

kfree(pages); /* kfree is NULL-proof */

+fail_buf: kfree(buf);

return ERR_PTR(ret); }

+/*********************************************/ +/* DMA CONTIG exported functions */ +/*********************************************/

const struct vb2_mem_ops vb2_dma_contig_memops = {

.alloc = vb2_dma_contig_alloc,

.put = vb2_dma_contig_put,

.cookie = vb2_dma_contig_cookie,

.vaddr = vb2_dma_contig_vaddr,

.mmap = vb2_dma_contig_mmap,

.get_userptr = vb2_dma_contig_get_userptr,

.put_userptr = vb2_dma_contig_put_userptr,

.num_users = vb2_dma_contig_num_users,

.alloc = vb2_dc_alloc,

.put = vb2_dc_put,

.cookie = vb2_dc_cookie,

.vaddr = vb2_dc_vaddr,

.mmap = vb2_dc_mmap,

.get_userptr = vb2_dc_get_userptr,

.put_userptr = vb2_dc_put_userptr,

.prepare = vb2_dc_prepare,

.finish = vb2_dc_finish,

.num_users = vb2_dc_num_users, }; EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);

void *vb2_dma_contig_init_ctx(struct device *dev) {
struct vb2_dc_conf *conf;

conf = kzalloc(sizeof *conf, GFP_KERNEL);

if (!conf)
return ERR_PTR(-ENOMEM);
conf->dev = dev;

return conf;
return dev; } EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx);

void vb2_dma_contig_cleanup_ctx(void *alloc_ctx) {

kfree(alloc_ctx); } EXPORT_SYMBOL_GPL(vb2_dma_contig_cleanup_ctx);

Regards, Subash

Tomasz Stanislawski

13 Mar 13 Mar

10:17 a.m.

New subject: [RFCv2 PATCH 3/9] v4l: vb2: Add dma-contig allocator as dma_buf user

From: Sumit Semwal sumit.semwal@ti.com

This patch makes changes for adding dma-contig as a dma_buf user. It provides function implementations for the {attach, detach, map, unmap}_dmabuf() mem_ops of DMABUF memory type.

Signed-off-by: Sumit Semwal sumit.semwal@ti.com Signed-off-by: Sumit Semwal sumit.semwal@linaro.org [author of the original patch] Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com [integration with refactored dma-contig allocator] --- drivers/media/video/videobuf2-dma-contig.c | 116 ++++++++++++++++++++++++++++ 1 files changed, 116 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index c1dc043..746dd5f 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -35,6 +35,9 @@ struct vb2_dc_buf {

/* USERPTR related */ struct vm_area_struct *vma; + + /* DMABUF related */ + struct dma_buf_attachment *db_attach; };

/*********************************************/ @@ -485,6 +488,115 @@ fail_buf: }

/*********************************************/ +/* callbacks for DMABUF buffers */ +/*********************************************/ + +static int vb2_dc_map_dmabuf(void *mem_priv) +{ + struct vb2_dc_buf *buf = mem_priv; + struct sg_table *sgt; + unsigned long contig_size; + int ret = 0; + + if (WARN_ON(!buf->db_attach)) { + printk(KERN_ERR "trying to pin a non attached buffer\n"); + return -EINVAL; + } + + if (WARN_ON(buf->dma_sgt)) { + printk(KERN_ERR "dmabuf buffer is already pinned\n"); + return 0; + } + + /* get the associated scatterlist for this buffer */ + sgt = dma_buf_map_attachment(buf->db_attach, buf->dma_dir); + if (IS_ERR_OR_NULL(sgt)) { + printk(KERN_ERR "Error getting dmabuf scatterlist\n"); + return -EINVAL; + } + + /* checking if dmabuf is big enough to store contiguous chunk */ + contig_size = vb2_dc_get_contiguous_size(sgt); + if (contig_size < buf->size) { + printk(KERN_ERR "contiguous chunk of dmabuf is too small\n"); + ret = -EFAULT; + goto fail_map; + } + + buf->dma_addr = sg_dma_address(sgt->sgl); + buf->dma_sgt = sgt; + + return 0; + +fail_map: + dma_buf_unmap_attachment(buf->db_attach, sgt); + + return ret; +} + +static void vb2_dc_unmap_dmabuf(void *mem_priv) +{ + struct vb2_dc_buf *buf = mem_priv; + struct sg_table *sgt = buf->dma_sgt; + + if (WARN_ON(!buf->db_attach)) { + printk(KERN_ERR "trying to unpin a not attached buffer\n"); + return; + } + + if (WARN_ON(!sgt)) { + printk(KERN_ERR "dmabuf buffer is already unpinned\n"); + return; + } + + dma_buf_unmap_attachment(buf->db_attach, sgt); + + buf->dma_addr = 0; + buf->dma_sgt = NULL; +} + +static void vb2_dc_detach_dmabuf(void *mem_priv) +{ + struct vb2_dc_buf *buf = mem_priv; + + if (buf->dma_addr) + vb2_dc_unmap_dmabuf(buf); + + /* detach this attachment */ + dma_buf_detach(buf->db_attach->dmabuf, buf->db_attach); + kfree(buf); +} + +static void *vb2_dc_attach_dmabuf(void *alloc_ctx, struct dma_buf *dbuf, + unsigned long size, int write) +{ + struct vb2_dc_buf *buf; + struct dma_buf_attachment *dba; + + if (dbuf->size < size) + return ERR_PTR(-EFAULT); + + buf = kzalloc(sizeof *buf, GFP_KERNEL); + if (!buf) + return ERR_PTR(-ENOMEM); + + buf->dev = alloc_ctx; + /* create attachment for the dmabuf with the user device */ + dba = dma_buf_attach(dbuf, buf->dev); + if (IS_ERR(dba)) { + printk(KERN_ERR "failed to attach dmabuf\n"); + kfree(buf); + return dba; + } + + buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + buf->size = size; + buf->db_attach = dba; + + return buf; +} + +/*********************************************/ /* DMA CONTIG exported functions */ /*********************************************/

@@ -498,6 +610,10 @@ const struct vb2_mem_ops vb2_dma_contig_memops = { .put_userptr = vb2_dc_put_userptr, .prepare = vb2_dc_prepare, .finish = vb2_dc_finish, + .map_dmabuf = vb2_dc_map_dmabuf, + .unmap_dmabuf = vb2_dc_unmap_dmabuf, + .attach_dmabuf = vb2_dc_attach_dmabuf, + .detach_dmabuf = vb2_dc_detach_dmabuf, .num_users = vb2_dc_num_users, }; EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);

-- 1.7.5.4

Laurent Pinchart

22 Mar 22 Mar

11:04 a.m.

New subject: [RFCv2 PATCH 3/9] v4l: vb2: Add dma-contig allocator as dma_buf user

Hi Tomasz,

On Tuesday 13 March 2012 11:17:01 Tomasz Stanislawski wrote:

...

From: Sumit Semwal sumit.semwal@ti.com

This patch makes changes for adding dma-contig as a dma_buf user. It provides function implementations for the {attach, detach, map, unmap}_dmabuf() mem_ops of DMABUF memory type.

Signed-off-by: Sumit Semwal sumit.semwal@ti.com Signed-off-by: Sumit Semwal sumit.semwal@linaro.org [author of the original patch] Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com [integration with refactored dma-contig allocator]

drivers/media/video/videobuf2-dma-contig.c | 116 ++++++++++++++++++++++++++++ 1 files changed, 116 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index c1dc043..746dd5f 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -35,6 +35,9 @@ struct vb2_dc_buf {

/* USERPTR related */ struct vm_area_struct *vma;

/* DMABUF related */

struct dma_buf_attachment *db_attach;

};

/*********************************************/ @@ -485,6 +488,115 @@ fail_buf: }

/*********************************************/ +/* callbacks for DMABUF buffers */ +/*********************************************/

+static int vb2_dc_map_dmabuf(void *mem_priv) +{
struct vb2_dc_buf *buf = mem_priv;

struct sg_table *sgt;

unsigned long contig_size;

int ret = 0;

if (WARN_ON(!buf->db_attach)) {
printk(KERN_ERR "trying to pin a non attached buffer\n");
return -EINVAL;
}

if (WARN_ON(buf->dma_sgt)) {
printk(KERN_ERR "dmabuf buffer is already pinned\n");
return 0;
}

/* get the associated scatterlist for this buffer */

sgt = dma_buf_map_attachment(buf->db_attach, buf->dma_dir);

if (IS_ERR_OR_NULL(sgt)) {
printk(KERN_ERR "Error getting dmabuf scatterlist\n");
return -EINVAL;
}

I've checked why dma_map_sg() was missing from here, and then remembered that the exporter is still responsible for mapping the buffer to the importer's device address space :-) I'll raise that topic separately.

...

/* checking if dmabuf is big enough to store contiguous chunk */

contig_size = vb2_dc_get_contiguous_size(sgt);

if (contig_size < buf->size) {
printk(KERN_ERR "contiguous chunk of dmabuf is too small\n");
ret = -EFAULT;
goto fail_map;
}

buf->dma_addr = sg_dma_address(sgt->sgl);

buf->dma_sgt = sgt;

return 0;
+fail_map:

dma_buf_unmap_attachment(buf->db_attach, sgt);

return ret;

+}

+static void vb2_dc_unmap_dmabuf(void *mem_priv) +{
struct vb2_dc_buf *buf = mem_priv;

struct sg_table *sgt = buf->dma_sgt;

if (WARN_ON(!buf->db_attach)) {
printk(KERN_ERR "trying to unpin a not attached buffer\n");
return;
}

if (WARN_ON(!sgt)) {
printk(KERN_ERR "dmabuf buffer is already unpinned\n");
return;
}

dma_buf_unmap_attachment(buf->db_attach, sgt);

buf->dma_addr = 0;

buf->dma_sgt = NULL;
+}

+static void vb2_dc_detach_dmabuf(void *mem_priv) +{
struct vb2_dc_buf *buf = mem_priv;

if (buf->dma_addr)
vb2_dc_unmap_dmabuf(buf);

Can detach() be called with the buffer still mapped() ? Wouldn't that be a bug in the caller ?

...

/* detach this attachment */

dma_buf_detach(buf->db_attach->dmabuf, buf->db_attach);

kfree(buf);

+}

There's nothing allocator-specific in the attach/detach operations. Shouldn't they be moved to the vb2 core ?

...

+static void *vb2_dc_attach_dmabuf(void *alloc_ctx, struct dma_buf *dbuf,

unsigned long size, int write)

+{
struct vb2_dc_buf *buf;

struct dma_buf_attachment *dba;

if (dbuf->size < size)
return ERR_PTR(-EFAULT);
buf = kzalloc(sizeof *buf, GFP_KERNEL);

if (!buf)
return ERR_PTR(-ENOMEM);
buf->dev = alloc_ctx;

/* create attachment for the dmabuf with the user device */

dba = dma_buf_attach(dbuf, buf->dev);

if (IS_ERR(dba)) {
printk(KERN_ERR "failed to attach dmabuf\n");
kfree(buf);
return dba;
}

buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;

buf->size = size;

buf->db_attach = dba;

return buf;
+}

+/*********************************************/ /* DMA CONTIG exported functions */ /*********************************************/

@@ -498,6 +610,10 @@ const struct vb2_mem_ops vb2_dma_contig_memops = { .put_userptr = vb2_dc_put_userptr, .prepare = vb2_dc_prepare, .finish = vb2_dc_finish,

.map_dmabuf = vb2_dc_map_dmabuf,

.unmap_dmabuf = vb2_dc_unmap_dmabuf,

.attach_dmabuf = vb2_dc_attach_dmabuf,

.detach_dmabuf = vb2_dc_detach_dmabuf, .num_users = vb2_dc_num_users,

}; EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);

-- Regards, Laurent Pinchart

Tomasz Stanislawski

26 Mar 26 Mar

3:53 p.m.

New subject: [RFCv2 PATCH 3/9] v4l: vb2: Add dma-contig allocator as dma_buf user

Hi Laurent, Please refer to the comments below.

On 03/22/2012 12:04 PM, Laurent Pinchart wrote:

...

Hi Tomasz,

On Tuesday 13 March 2012 11:17:01 Tomasz Stanislawski wrote:

...
From: Sumit Semwal sumit.semwal@ti.com

This patch makes changes for adding dma-contig as a dma_buf user. It provides function implementations for the {attach, detach, map, unmap}_dmabuf() mem_ops of DMABUF memory type.

Signed-off-by: Sumit Semwal sumit.semwal@ti.com Signed-off-by: Sumit Semwal sumit.semwal@linaro.org [author of the original patch] Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com [integration with refactored dma-contig allocator]

drivers/media/video/videobuf2-dma-contig.c | 116 ++++++++++++++++++++++++++++ 1 files changed, 116 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index c1dc043..746dd5f 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -35,6 +35,9 @@ struct vb2_dc_buf {

/* USERPTR related */ struct vm_area_struct *vma;

/* DMABUF related */

struct dma_buf_attachment *db_attach;

};

/*********************************************/ @@ -485,6 +488,115 @@ fail_buf: }

/*********************************************/ +/* callbacks for DMABUF buffers */ +/*********************************************/

+static int vb2_dc_map_dmabuf(void *mem_priv) +{
struct vb2_dc_buf *buf = mem_priv;

struct sg_table *sgt;

unsigned long contig_size;

int ret = 0;

if (WARN_ON(!buf->db_attach)) {
printk(KERN_ERR "trying to pin a non attached buffer\n");
return -EINVAL;
}

if (WARN_ON(buf->dma_sgt)) {
printk(KERN_ERR "dmabuf buffer is already pinned\n");
return 0;
}

/* get the associated scatterlist for this buffer */

sgt = dma_buf_map_attachment(buf->db_attach, buf->dma_dir);

if (IS_ERR_OR_NULL(sgt)) {
printk(KERN_ERR "Error getting dmabuf scatterlist\n");
return -EINVAL;
}
I've checked why dma_map_sg() was missing from here, and then remembered that the exporter is still responsible for mapping the buffer to the importer's device address space :-) I'll raise that topic separately.

...
/* checking if dmabuf is big enough to store contiguous chunk */

contig_size = vb2_dc_get_contiguous_size(sgt);

if (contig_size < buf->size) {
printk(KERN_ERR "contiguous chunk of dmabuf is too small\n");
ret = -EFAULT;
goto fail_map;
}

buf->dma_addr = sg_dma_address(sgt->sgl);

buf->dma_sgt = sgt;

return 0;
+fail_map:

dma_buf_unmap_attachment(buf->db_attach, sgt);

return ret;

+}

+static void vb2_dc_unmap_dmabuf(void *mem_priv) +{
struct vb2_dc_buf *buf = mem_priv;

struct sg_table *sgt = buf->dma_sgt;

if (WARN_ON(!buf->db_attach)) {
printk(KERN_ERR "trying to unpin a not attached buffer\n");
return;
}

if (WARN_ON(!sgt)) {
printk(KERN_ERR "dmabuf buffer is already unpinned\n");
return;
}

dma_buf_unmap_attachment(buf->db_attach, sgt);

buf->dma_addr = 0;

buf->dma_sgt = NULL;
+}

+static void vb2_dc_detach_dmabuf(void *mem_priv) +{
struct vb2_dc_buf *buf = mem_priv;

if (buf->dma_addr)
vb2_dc_unmap_dmabuf(buf);
Can detach() be called with the buffer still mapped() ? Wouldn't that be a bug in the caller ?

No, it is not. The functions from vb2_dc_*_dmabuf are called by vb2-core. It is not a part of DMABUF API itself. Therefore usage can be a bit different. Please refer to the function __qbuf_dmabuf function vb2-core. If new DMABUF is passed to in the QBUF then old DMABUF is released by calling detach_dmabuf without unmap. However, this part of code is probably not reachable if the buffer was not dequeued.

Detach without unmap may happen in a case of closing a video fd without prior dequeuing of all buffers.

Do you think it would be a good idea to move a call map_dmabuf to __enqueue_in_driver and add calling unmap_dmabuf to vb2_buffer_done?

...

...

/* detach this attachment */

dma_buf_detach(buf->db_attach->dmabuf, buf->db_attach);

kfree(buf);

+}

There's nothing allocator-specific in the attach/detach operations. Shouldn't they be moved to the vb2 core ?

Calling dma_buf_attach could be moved to vb2-core. But it gives little gain. First, the pointer of dma_buf_attachment would have to be added to struct vb2_plane. Second, the allocator would have to keep in the copy of this pointer in its context structure for use of vb2_dc_(un)map_dmabuf functions. Third, dma_buf_attach requires a pointer to 'struct device' which is not available in the vb2-core layer.

Because of the mentioned reasons I decided to keep attach_dmabuf in allocator-specific code.

...

...
+static void *vb2_dc_attach_dmabuf(void *alloc_ctx, struct dma_buf *dbuf,

unsigned long size, int write)

+{
struct vb2_dc_buf *buf;

struct dma_buf_attachment *dba;

if (dbuf->size < size)
return ERR_PTR(-EFAULT);
buf = kzalloc(sizeof *buf, GFP_KERNEL);

if (!buf)
return ERR_PTR(-ENOMEM);
buf->dev = alloc_ctx;

/* create attachment for the dmabuf with the user device */

dba = dma_buf_attach(dbuf, buf->dev);

if (IS_ERR(dba)) {
printk(KERN_ERR "failed to attach dmabuf\n");
kfree(buf);
return dba;
}

buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;

buf->size = size;

buf->db_attach = dba;

return buf;
+}

+/*********************************************/ /* DMA CONTIG exported functions */ /*********************************************/

@@ -498,6 +610,10 @@ const struct vb2_mem_ops vb2_dma_contig_memops = { .put_userptr = vb2_dc_put_userptr, .prepare = vb2_dc_prepare, .finish = vb2_dc_finish,

.map_dmabuf = vb2_dc_map_dmabuf,

.unmap_dmabuf = vb2_dc_unmap_dmabuf,

.attach_dmabuf = vb2_dc_attach_dmabuf,

.detach_dmabuf = vb2_dc_detach_dmabuf, .num_users = vb2_dc_num_users,

}; EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);

Best regards, Tomasz Stanislawski

Laurent Pinchart

28 Mar 28 Mar

3:50 p.m.

New subject: [RFCv2 PATCH 3/9] v4l: vb2: Add dma-contig allocator as dma_buf user

Hi Tomasz,

On Monday 26 March 2012 17:53:09 Tomasz Stanislawski wrote:

...

On 03/22/2012 12:04 PM, Laurent Pinchart wrote:

...
On Tuesday 13 March 2012 11:17:01 Tomasz Stanislawski wrote:

[snip]

...

...
...
+static void vb2_dc_detach_dmabuf(void *mem_priv) +{
struct vb2_dc_buf *buf = mem_priv;

if (buf->dma_addr)
vb2_dc_unmap_dmabuf(buf);
Can detach() be called with the buffer still mapped() ? Wouldn't that be a bug in the caller ?
No, it is not. The functions from vb2_dc_*_dmabuf are called by vb2-core. It is not a part of DMABUF API itself. Therefore usage can be a bit different. Please refer to the function __qbuf_dmabuf function vb2-core. If new DMABUF is passed to in the QBUF then old DMABUF is released by calling detach_dmabuf without unmap. However, this part of code is probably not reachable if the buffer was not dequeued.

Detach without unmap may happen in a case of closing a video fd without prior dequeuing of all buffers.

Do you think it would be a good idea to move a call map_dmabuf to __enqueue_in_driver and add calling unmap_dmabuf to vb2_buffer_done?

That's hard to tell. From the DRM point of view, I expect that you will be asked to keep the buffer mapped for as little time as possible. This is a sane behaviour, but will have a performance impact as we will have to constantly map/unmap buffers. From a V4L2 point of view I would prefer keeping the mappins when possible. This topic has already been discussed, and we haven't reached any consensus yet as far as I know :-/

...

...
...

/* detach this attachment */

dma_buf_detach(buf->db_attach->dmabuf, buf->db_attach);

kfree(buf);

+}

There's nothing allocator-specific in the attach/detach operations. Shouldn't they be moved to the vb2 core ?

Calling dma_buf_attach could be moved to vb2-core. But it gives little gain. First, the pointer of dma_buf_attachment would have to be added to struct vb2_plane. Second, the allocator would have to keep in the copy of this pointer in its context structure for use of vb2_dc_(un)map_dmabuf functions.

Right. Would it make sense to pass the vb2_plane pointer, or possibly the dma_buf_attachment pointer, to the mmap_dmabuf and unmap_dmabuf operations ?

...

Third, dma_buf_attach requires a pointer to 'struct device' which is not available in the vb2-core layer.

OK, that's a problem.

...

Because of the mentioned reasons I decided to keep attach_dmabuf in allocator-specific code.

Maybe it would make sense to create a vb2_mem_buf structure from which vb2_dc_buf (and other allocator-specific buffer descriptors) would inherit ? That structure would store the dma_buf_attach pointer, and common dma-buf code could be put in videobuf2-memops.c and shared between allocators. Just an idea.

-- Regards, Laurent Pinchart

Tomasz Stanislawski

13 Mar 13 Mar

10:17 a.m.

New subject: [RFCv2 PATCH 4/9] v4l: add buffer exporting via dmabuf

This patch adds extension to V4L2 api. It allow to export a mmap buffer as file descriptor. New ioctl VIDIOC_EXPBUF is added. It takes a buffer offset used by mmap and return a file descriptor on success.

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- drivers/media/video/v4l2-compat-ioctl32.c | 1 + drivers/media/video/v4l2-ioctl.c | 11 +++++++++++ include/linux/videodev2.h | 20 ++++++++++++++++++++ include/media/v4l2-ioctl.h | 2 ++ 4 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/v4l2-compat-ioctl32.c b/drivers/media/video/v4l2-compat-ioctl32.c index e6f67aa..fd157cb 100644 --- a/drivers/media/video/v4l2-compat-ioctl32.c +++ b/drivers/media/video/v4l2-compat-ioctl32.c @@ -954,6 +954,7 @@ long v4l2_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg) case VIDIOC_S_FBUF32: case VIDIOC_OVERLAY32: case VIDIOC_QBUF32: + case VIDIOC_EXPBUF: case VIDIOC_DQBUF32: case VIDIOC_STREAMON32: case VIDIOC_STREAMOFF32: diff --git a/drivers/media/video/v4l2-ioctl.c b/drivers/media/video/v4l2-ioctl.c index 74cab51..a125016 100644 --- a/drivers/media/video/v4l2-ioctl.c +++ b/drivers/media/video/v4l2-ioctl.c @@ -207,6 +207,7 @@ static const char *v4l2_ioctls[] = { [_IOC_NR(VIDIOC_S_FBUF)] = "VIDIOC_S_FBUF", [_IOC_NR(VIDIOC_OVERLAY)] = "VIDIOC_OVERLAY", [_IOC_NR(VIDIOC_QBUF)] = "VIDIOC_QBUF", + [_IOC_NR(VIDIOC_EXPBUF)] = "VIDIOC_EXPBUF", [_IOC_NR(VIDIOC_DQBUF)] = "VIDIOC_DQBUF", [_IOC_NR(VIDIOC_STREAMON)] = "VIDIOC_STREAMON", [_IOC_NR(VIDIOC_STREAMOFF)] = "VIDIOC_STREAMOFF", @@ -938,6 +939,16 @@ static long __video_do_ioctl(struct file *file, dbgbuf(cmd, vfd, p); break; } + case VIDIOC_EXPBUF: + { + struct v4l2_exportbuffer *p = arg; + + if (!ops->vidioc_expbuf) + break; + + ret = ops->vidioc_expbuf(file, fh, p); + break; + } case VIDIOC_DQBUF: { struct v4l2_buffer *p = arg; diff --git a/include/linux/videodev2.h b/include/linux/videodev2.h index bb6844e..e71c787 100644 --- a/include/linux/videodev2.h +++ b/include/linux/videodev2.h @@ -680,6 +680,25 @@ struct v4l2_buffer { #define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 0x0800 #define V4L2_BUF_FLAG_NO_CACHE_CLEAN 0x1000

+/** + * struct v4l2_exportbuffer - export of video buffer as DMABUF file descriptor + * + * @fd: file descriptor associated with DMABUF (set by driver) + * @mem_offset: for non-multiplanar buffers with memory == V4L2_MEMORY_MMAP; + * offset from the start of the device memory for this plane, + * (or a "cookie" that should be passed to mmap() as offset) + * + * Contains data used for exporting a video buffer as DMABUF file + * descriptor. Uses the same 'cookie' as mmap() syscall. All reserved fields + * must be set to zero. + */ +struct v4l2_exportbuffer { + __u32 fd; + __u32 reserved0; + __u32 mem_offset; + __u32 reserved[13]; +}; + /* * O V E R L A Y P R E V I E W */ @@ -2303,6 +2322,7 @@ struct v4l2_create_buffers { #define VIDIOC_S_FBUF _IOW('V', 11, struct v4l2_framebuffer) #define VIDIOC_OVERLAY _IOW('V', 14, int) #define VIDIOC_QBUF _IOWR('V', 15, struct v4l2_buffer) +#define VIDIOC_EXPBUF _IOWR('V', 16, struct v4l2_exportbuffer) #define VIDIOC_DQBUF _IOWR('V', 17, struct v4l2_buffer) #define VIDIOC_STREAMON _IOW('V', 18, int) #define VIDIOC_STREAMOFF _IOW('V', 19, int) diff --git a/include/media/v4l2-ioctl.h b/include/media/v4l2-ioctl.h index 4df031a..d8716c6f 100644 --- a/include/media/v4l2-ioctl.h +++ b/include/media/v4l2-ioctl.h @@ -120,6 +120,8 @@ struct v4l2_ioctl_ops { int (*vidioc_reqbufs) (struct file *file, void *fh, struct v4l2_requestbuffers *b); int (*vidioc_querybuf)(struct file *file, void *fh, struct v4l2_buffer *b); int (*vidioc_qbuf) (struct file *file, void *fh, struct v4l2_buffer *b); + int (*vidioc_expbuf) (struct file *file, void *fh, + struct v4l2_exportbuffer *e); int (*vidioc_dqbuf) (struct file *file, void *fh, struct v4l2_buffer *b);

int (*vidioc_create_bufs)(struct file *file, void *fh, struct v4l2_create_buffers *b);

-- 1.7.5.4

Daniel Vetter

18 Mar 18 Mar

9:47 p.m.

New subject: [RFCv2 PATCH 4/9] v4l: add buffer exporting via dmabuf

On Tue, Mar 13, 2012 at 11:17:02AM +0100, Tomasz Stanislawski wrote:

...

This patch adds extension to V4L2 api. It allow to export a mmap buffer as file descriptor. New ioctl VIDIOC_EXPBUF is added. It takes a buffer offset used by mmap and return a file descriptor on success.

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com

Just a quick comment: You need to add a flags parameter to at least support O_CLOSEXEC semantics on the returned fd. Otherwise library writes will plainly hate you and demand a new version with this support added. -Daniel

...

drivers/media/video/v4l2-compat-ioctl32.c | 1 + drivers/media/video/v4l2-ioctl.c | 11 +++++++++++ include/linux/videodev2.h | 20 ++++++++++++++++++++ include/media/v4l2-ioctl.h | 2 ++ 4 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/v4l2-compat-ioctl32.c b/drivers/media/video/v4l2-compat-ioctl32.c index e6f67aa..fd157cb 100644 --- a/drivers/media/video/v4l2-compat-ioctl32.c +++ b/drivers/media/video/v4l2-compat-ioctl32.c @@ -954,6 +954,7 @@ long v4l2_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg) case VIDIOC_S_FBUF32: case VIDIOC_OVERLAY32: case VIDIOC_QBUF32:

case VIDIOC_EXPBUF: case VIDIOC_DQBUF32: case VIDIOC_STREAMON32: case VIDIOC_STREAMOFF32:

diff --git a/drivers/media/video/v4l2-ioctl.c b/drivers/media/video/v4l2-ioctl.c index 74cab51..a125016 100644 --- a/drivers/media/video/v4l2-ioctl.c +++ b/drivers/media/video/v4l2-ioctl.c @@ -207,6 +207,7 @@ static const char *v4l2_ioctls[] = { [_IOC_NR(VIDIOC_S_FBUF)] = "VIDIOC_S_FBUF", [_IOC_NR(VIDIOC_OVERLAY)] = "VIDIOC_OVERLAY", [_IOC_NR(VIDIOC_QBUF)] = "VIDIOC_QBUF",

[_IOC_NR(VIDIOC_EXPBUF)] = "VIDIOC_EXPBUF", [_IOC_NR(VIDIOC_DQBUF)] = "VIDIOC_DQBUF", [_IOC_NR(VIDIOC_STREAMON)] = "VIDIOC_STREAMON", [_IOC_NR(VIDIOC_STREAMOFF)] = "VIDIOC_STREAMOFF",

@@ -938,6 +939,16 @@ static long __video_do_ioctl(struct file *file, dbgbuf(cmd, vfd, p); break; }
case VIDIOC_EXPBUF:

{
struct v4l2_exportbuffer *p = arg;
if (!ops->vidioc_expbuf)
	break;
ret = ops->vidioc_expbuf(file, fh, p);
break;
} case VIDIOC_DQBUF: { struct v4l2_buffer *p = arg;
diff --git a/include/linux/videodev2.h b/include/linux/videodev2.h index bb6844e..e71c787 100644 --- a/include/linux/videodev2.h +++ b/include/linux/videodev2.h @@ -680,6 +680,25 @@ struct v4l2_buffer { #define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 0x0800 #define V4L2_BUF_FLAG_NO_CACHE_CLEAN 0x1000 +/**
struct v4l2_exportbuffer - export of video buffer as DMABUF file descriptor

@fd: file descriptor associated with DMABUF (set by driver)

@mem_offset: for non-multiplanar buffers with memory == V4L2_MEMORY_MMAP;
offset from the start of the device memory for this plane,
(or a "cookie" that should be passed to mmap() as offset)
Contains data used for exporting a video buffer as DMABUF file

descriptor. Uses the same 'cookie' as mmap() syscall. All reserved fields

must be set to zero.

*/
+struct v4l2_exportbuffer {

__u32 fd;

__u32 reserved0;

__u32 mem_offset;

__u32 reserved[13];

+};

/*

O V E R L A Y P R E V I E W

*/ @@ -2303,6 +2322,7 @@ struct v4l2_create_buffers { #define VIDIOC_S_FBUF _IOW('V', 11, struct v4l2_framebuffer) #define VIDIOC_OVERLAY _IOW('V', 14, int) #define VIDIOC_QBUF _IOWR('V', 15, struct v4l2_buffer) +#define VIDIOC_EXPBUF _IOWR('V', 16, struct v4l2_exportbuffer) #define VIDIOC_DQBUF _IOWR('V', 17, struct v4l2_buffer) #define VIDIOC_STREAMON _IOW('V', 18, int) #define VIDIOC_STREAMOFF _IOW('V', 19, int) diff --git a/include/media/v4l2-ioctl.h b/include/media/v4l2-ioctl.h index 4df031a..d8716c6f 100644 --- a/include/media/v4l2-ioctl.h +++ b/include/media/v4l2-ioctl.h @@ -120,6 +120,8 @@ struct v4l2_ioctl_ops { int (*vidioc_reqbufs) (struct file *file, void *fh, struct v4l2_requestbuffers *b); int (*vidioc_querybuf)(struct file *file, void *fh, struct v4l2_buffer *b); int (*vidioc_qbuf) (struct file *file, void *fh, struct v4l2_buffer *b);
int (*vidioc_expbuf) (struct file *file, void *fh,
		struct v4l2_exportbuffer *e);
int (*vidioc_dqbuf) (struct file *file, void *fh, struct v4l2_buffer *b);
int (*vidioc_create_bufs)(struct file *file, void *fh, struct v4l2_create_buffers *b); -- 1.7.5.4

-- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48

Laurent Pinchart

22 Mar 22 Mar

11:16 a.m.

New subject: [RFCv2 PATCH 4/9] v4l: add buffer exporting via dmabuf

Hi Tomasz,

Thanks for the patch.

On Tuesday 13 March 2012 11:17:02 Tomasz Stanislawski wrote:

...

This patch adds extension to V4L2 api. It allow to export a mmap buffer as file descriptor. New ioctl VIDIOC_EXPBUF is added. It takes a buffer offset used by mmap and return a file descriptor on success.

I know code is more fun to write than documentation, but Documentation/DocBook/media/v4l will be sad if this patch is merged as-is ;-)

...

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com

drivers/media/video/v4l2-compat-ioctl32.c | 1 + drivers/media/video/v4l2-ioctl.c | 11 +++++++++++ include/linux/videodev2.h | 20 ++++++++++++++++++++ include/media/v4l2-ioctl.h | 2 ++ 4 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/v4l2-compat-ioctl32.c b/drivers/media/video/v4l2-compat-ioctl32.c index e6f67aa..fd157cb 100644 --- a/drivers/media/video/v4l2-compat-ioctl32.c +++ b/drivers/media/video/v4l2-compat-ioctl32.c @@ -954,6 +954,7 @@ long v4l2_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg) case VIDIOC_S_FBUF32: case VIDIOC_OVERLAY32: case VIDIOC_QBUF32:

case VIDIOC_EXPBUF: case VIDIOC_DQBUF32: case VIDIOC_STREAMON32: case VIDIOC_STREAMOFF32:

diff --git a/drivers/media/video/v4l2-ioctl.c b/drivers/media/video/v4l2-ioctl.c index 74cab51..a125016 100644 --- a/drivers/media/video/v4l2-ioctl.c +++ b/drivers/media/video/v4l2-ioctl.c @@ -207,6 +207,7 @@ static const char *v4l2_ioctls[] = { [_IOC_NR(VIDIOC_S_FBUF)] = "VIDIOC_S_FBUF", [_IOC_NR(VIDIOC_OVERLAY)] = "VIDIOC_OVERLAY", [_IOC_NR(VIDIOC_QBUF)] = "VIDIOC_QBUF",

[_IOC_NR(VIDIOC_EXPBUF)] = "VIDIOC_EXPBUF", [_IOC_NR(VIDIOC_DQBUF)] = "VIDIOC_DQBUF", [_IOC_NR(VIDIOC_STREAMON)] = "VIDIOC_STREAMON", [_IOC_NR(VIDIOC_STREAMOFF)] = "VIDIOC_STREAMOFF",

@@ -938,6 +939,16 @@ static long __video_do_ioctl(struct file *file, dbgbuf(cmd, vfd, p); break; }
case VIDIOC_EXPBUF:

{
struct v4l2_exportbuffer *p = arg;
if (!ops->vidioc_expbuf)
	break;
ret = ops->vidioc_expbuf(file, fh, p);

You can pass arg to ops->vidioc_expbuf() directly, there's no need to create a struct v4l2_exportbuffer *p variable.

...

break;
} case VIDIOC_DQBUF: { struct v4l2_buffer *p = arg;
diff --git a/include/linux/videodev2.h b/include/linux/videodev2.h index bb6844e..e71c787 100644 --- a/include/linux/videodev2.h +++ b/include/linux/videodev2.h @@ -680,6 +680,25 @@ struct v4l2_buffer { #define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 0x0800 #define V4L2_BUF_FLAG_NO_CACHE_CLEAN 0x1000

+/**

struct v4l2_exportbuffer - export of video buffer as DMABUF file

descriptor

@fd: file descriptor associated with DMABUF (set by driver)

@mem_offset: for non-multiplanar buffers with memory ==

V4L2_MEMORY_MMAP;

I don't think we will ever support exporting anything else than V4L2_MEMORY_MMAP buffers. What will happen for multiplanar buffers ?

...

offset from the start of the device memory for this plane,

(or a "cookie" that should be passed to mmap() as offset)

Shouldn't the mem_offset field always be set to the mmap cookie value ? I'm a bit confused by the "or" part, it seems to have been copied from the v4l2_buffer documentation directly. We should clarify that.

...

Contains data used for exporting a video buffer as DMABUF file

descriptor. Uses the same 'cookie' as mmap() syscall. All reserved

fields

must be set to zero.

*/

+struct v4l2_exportbuffer {

__u32 fd;

__u32 reserved0;

Why is there a reserved field here ?

...

__u32 mem_offset;

__u32 reserved[13];

+};

/*

O V E R L A Y P R E V I E W

*/ @@ -2303,6 +2322,7 @@ struct v4l2_create_buffers { #define VIDIOC_S_FBUF _IOW('V', 11, struct v4l2_framebuffer) #define VIDIOC_OVERLAY _IOW('V', 14, int) #define VIDIOC_QBUF _IOWR('V', 15, struct v4l2_buffer) +#define VIDIOC_EXPBUF _IOWR('V', 16, struct v4l2_exportbuffer) #define VIDIOC_DQBUF _IOWR('V', 17, struct v4l2_buffer) #define VIDIOC_STREAMON _IOW('V', 18, int) #define VIDIOC_STREAMOFF _IOW('V', 19, int) diff --git a/include/media/v4l2-ioctl.h b/include/media/v4l2-ioctl.h index 4df031a..d8716c6f 100644 --- a/include/media/v4l2-ioctl.h +++ b/include/media/v4l2-ioctl.h @@ -120,6 +120,8 @@ struct v4l2_ioctl_ops { int (*vidioc_reqbufs) (struct file *file, void *fh, struct v4l2_requestbuffers *b); int (*vidioc_querybuf)(struct file *file, void *fh, struct v4l2_buffer *b); int (*vidioc_qbuf) (struct file *file, void *fh, struct v4l2_buffer *b);
int (*vidioc_expbuf) (struct file *file, void *fh,
		struct v4l2_exportbuffer *e);
int (*vidioc_dqbuf) (struct file *file, void *fh, struct v4l2_buffer
*b);

int (*vidioc_create_bufs)(struct file *file, void *fh, struct v4l2_create_buffers *b);

-- Regards, Laurent Pinchart

Subash Patel

1:57 p.m.

New subject: [RFCv2 PATCH 4/9] v4l: add buffer exporting via dmabuf

Hi Thomasz,

On 03/22/2012 04:46 PM, Laurent Pinchart wrote:

...

Hi Tomasz,

Thanks for the patch.

On Tuesday 13 March 2012 11:17:02 Tomasz Stanislawski wrote:

...
This patch adds extension to V4L2 api. It allow to export a mmap buffer as file descriptor. New ioctl VIDIOC_EXPBUF is added. It takes a buffer offset used by mmap and return a file descriptor on success.

I know code is more fun to write than documentation, but Documentation/DocBook/media/v4l will be sad if this patch is merged as-is ;-)

...
Signed-off-by: Tomasz Stanislawskit.stanislaws@samsung.com Signed-off-by: Kyungmin Parkkyungmin.park@samsung.com

drivers/media/video/v4l2-compat-ioctl32.c | 1 + drivers/media/video/v4l2-ioctl.c | 11 +++++++++++ include/linux/videodev2.h | 20 ++++++++++++++++++++ include/media/v4l2-ioctl.h | 2 ++ 4 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/v4l2-compat-ioctl32.c b/drivers/media/video/v4l2-compat-ioctl32.c index e6f67aa..fd157cb 100644 --- a/drivers/media/video/v4l2-compat-ioctl32.c +++ b/drivers/media/video/v4l2-compat-ioctl32.c @@ -954,6 +954,7 @@ long v4l2_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg) case VIDIOC_S_FBUF32: case VIDIOC_OVERLAY32: case VIDIOC_QBUF32:

case VIDIOC_EXPBUF: case VIDIOC_DQBUF32: case VIDIOC_STREAMON32: case VIDIOC_STREAMOFF32:

diff --git a/drivers/media/video/v4l2-ioctl.c b/drivers/media/video/v4l2-ioctl.c index 74cab51..a125016 100644 --- a/drivers/media/video/v4l2-ioctl.c +++ b/drivers/media/video/v4l2-ioctl.c @@ -207,6 +207,7 @@ static const char *v4l2_ioctls[] = { [_IOC_NR(VIDIOC_S_FBUF)] = "VIDIOC_S_FBUF", [_IOC_NR(VIDIOC_OVERLAY)] = "VIDIOC_OVERLAY", [_IOC_NR(VIDIOC_QBUF)] = "VIDIOC_QBUF",

[_IOC_NR(VIDIOC_EXPBUF)] = "VIDIOC_EXPBUF", [_IOC_NR(VIDIOC_DQBUF)] = "VIDIOC_DQBUF", [_IOC_NR(VIDIOC_STREAMON)] = "VIDIOC_STREAMON", [_IOC_NR(VIDIOC_STREAMOFF)] = "VIDIOC_STREAMOFF",

@@ -938,6 +939,16 @@ static long __video_do_ioctl(struct file *file, dbgbuf(cmd, vfd, p); break; }
case VIDIOC_EXPBUF:

{
struct v4l2_exportbuffer *p = arg;
if (!ops->vidioc_expbuf)
	break;
ret = ops->vidioc_expbuf(file, fh, p);
You can pass arg to ops->vidioc_expbuf() directly, there's no need to create a struct v4l2_exportbuffer *p variable.

...
break;
} case VIDIOC_DQBUF: { struct v4l2_buffer *p = arg;
diff --git a/include/linux/videodev2.h b/include/linux/videodev2.h index bb6844e..e71c787 100644 --- a/include/linux/videodev2.h +++ b/include/linux/videodev2.h @@ -680,6 +680,25 @@ struct v4l2_buffer { #define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 0x0800 #define V4L2_BUF_FLAG_NO_CACHE_CLEAN 0x1000

+/**

struct v4l2_exportbuffer - export of video buffer as DMABUF file

descriptor

@fd: file descriptor associated with DMABUF (set by driver)

@mem_offset: for non-multiplanar buffers with memory ==

V4L2_MEMORY_MMAP;
I don't think we will ever support exporting anything else than V4L2_MEMORY_MMAP buffers. What will happen for multiplanar buffers ?

...
offset from the start of the device memory for this plane,
(or a "cookie" that should be passed to mmap() as offset)
Shouldn't the mem_offset field always be set to the mmap cookie value ? I'm a bit confused by the "or" part, it seems to have been copied from the v4l2_buffer documentation directly. We should clarify that.

...

Contains data used for exporting a video buffer as DMABUF file

descriptor. Uses the same 'cookie' as mmap() syscall. All reserved

fields

must be set to zero.

*/

+struct v4l2_exportbuffer {

__u32 fd;

__u32 reserved0;

Why is there a reserved field here ?

+1 to Laurent. Any particular need for reserved0 and reserved[13] below? I think one void user pointer is sufficient even for future need.

...

...

__u32 mem_offset;

__u32 reserved[13];

+};

Also, what is the reason for returning the fd through this structure? To keep it aligned with other v4l2 calls? I liked(or now hate making change in the app) how it was being returned through the ioctl in your last patch.

...

...
/*

O V E R L A Y P R E V I E W

*/ @@ -2303,6 +2322,7 @@ struct v4l2_create_buffers { #define VIDIOC_S_FBUF _IOW('V', 11, struct v4l2_framebuffer) #define VIDIOC_OVERLAY _IOW('V', 14, int) #define VIDIOC_QBUF _IOWR('V', 15, struct v4l2_buffer) +#define VIDIOC_EXPBUF _IOWR('V', 16, struct v4l2_exportbuffer) #define VIDIOC_DQBUF _IOWR('V', 17, struct v4l2_buffer) #define VIDIOC_STREAMON _IOW('V', 18, int) #define VIDIOC_STREAMOFF _IOW('V', 19, int) diff --git a/include/media/v4l2-ioctl.h b/include/media/v4l2-ioctl.h index 4df031a..d8716c6f 100644 --- a/include/media/v4l2-ioctl.h +++ b/include/media/v4l2-ioctl.h @@ -120,6 +120,8 @@ struct v4l2_ioctl_ops { int (*vidioc_reqbufs) (struct file *file, void *fh, struct v4l2_requestbuffers *b); int (*vidioc_querybuf)(struct file *file, void *fh, struct v4l2_buffer *b); int (*vidioc_qbuf) (struct file *file, void *fh, struct v4l2_buffer *b);
int (*vidioc_expbuf) (struct file *file, void *fh,
		struct v4l2_exportbuffer *e);
int (*vidioc_dqbuf) (struct file *file, void *fh, struct v4l2_buffer
*b);

int (*vidioc_create_bufs)(struct file *file, void *fh, struct v4l2_create_buffers *b);

Regards, Subash

Laurent Pinchart

2:07 p.m.

New subject: [RFCv2 PATCH 4/9] v4l: add buffer exporting via dmabuf

Hi Subash,

On Thursday 22 March 2012 19:27:01 Subash Patel wrote:

...

On 03/22/2012 04:46 PM, Laurent Pinchart wrote:

...
On Tuesday 13 March 2012 11:17:02 Tomasz Stanislawski wrote:

[snip]

...

...
...
diff --git a/include/linux/videodev2.h b/include/linux/videodev2.h index bb6844e..e71c787 100644 --- a/include/linux/videodev2.h +++ b/include/linux/videodev2.h @@ -680,6 +680,25 @@ struct v4l2_buffer {

#define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 0x0800 #define V4L2_BUF_FLAG_NO_CACHE_CLEAN 0x1000

+/**

struct v4l2_exportbuffer - export of video buffer as DMABUF file

descriptor

@fd: file descriptor associated with DMABUF (set by driver)

@mem_offset: for non-multiplanar buffers with memory ==

V4L2_MEMORY_MMAP;

I don't think we will ever support exporting anything else than V4L2_MEMORY_MMAP buffers. What will happen for multiplanar buffers ?

...
offset from the start of the device memory for this plane,
(or a "cookie" that should be passed to mmap() as offset)
Shouldn't the mem_offset field always be set to the mmap cookie value ? I'm a bit confused by the "or" part, it seems to have been copied from the v4l2_buffer documentation directly. We should clarify that.

...

Contains data used for exporting a video buffer as DMABUF file

descriptor. Uses the same 'cookie' as mmap() syscall. All reserved

fields

must be set to zero.

*/

+struct v4l2_exportbuffer {

__u32 fd;

__u32 reserved0;

Why is there a reserved field here ?
+1 to Laurent. Any particular need for reserved0 and reserved[13] below? I think one void user pointer is sufficient even for future need.

I'd rather have more than one void user pointer, just in case. A couple of bytes won't be expensive, and they will save lots of hassle in the future if we need to add a couple of fields. I was just wondering why there was a reserved field between fd and mem_offset.

...

...
...

__u32 mem_offset;

__u32 reserved[13];

+};

Also, what is the reason for returning the fd through this structure? To keep it aligned with other v4l2 calls? I liked(or now hate making change in the app) how it was being returned through the ioctl in your last patch.

Probably to be consistent with the V4L2 API, yes. It won't make a big difference for applications, I would favor consistency in this case.

-- Regards, Laurent Pinchart

Daniel Vetter

2:26 p.m.

New subject: [RFCv2 PATCH 4/9] v4l: add buffer exporting via dmabuf

On Thu, Mar 22, 2012 at 03:07:18PM +0100, Laurent Pinchart wrote:

...

...
...
...

Contains data used for exporting a video buffer as DMABUF file

descriptor. Uses the same 'cookie' as mmap() syscall. All reserved

fields

must be set to zero.

*/

+struct v4l2_exportbuffer {

__u32 fd;

__u32 reserved0;

Why is there a reserved field here ?

+1 to Laurent. Any particular need for reserved0 and reserved[13] below? I think one void user pointer is sufficient even for future need.

I'd rather have more than one void user pointer, just in case. A couple of bytes won't be expensive, and they will save lots of hassle in the future if we need to add a couple of fields. I was just wondering why there was a reserved field between fd and mem_offset.

Quick comment from a random bystander: __u32 is not big enough for a void* pointer from userspace ... -Daniel

-- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48

Laurent Pinchart

2:43 p.m.

New subject: [RFCv2 PATCH 4/9] v4l: add buffer exporting via dmabuf

On Thursday 22 March 2012 15:26:08 Daniel Vetter wrote:

...

On Thu, Mar 22, 2012 at 03:07:18PM +0100, Laurent Pinchart wrote:

...
...
...
...

Contains data used for exporting a video buffer as DMABUF file

descriptor. Uses the same 'cookie' as mmap() syscall. All

reserved fields

must be set to zero.

*/

+struct v4l2_exportbuffer {

__u32 fd;

__u32 reserved0;

Why is there a reserved field here ?

+1 to Laurent. Any particular need for reserved0 and reserved[13] below? I think one void user pointer is sufficient even for future need.

I'd rather have more than one void user pointer, just in case. A couple of bytes won't be expensive, and they will save lots of hassle in the future if we need to add a couple of fields. I was just wondering why there was a reserved field between fd and mem_offset.

Quick comment from a random bystander: __u32 is not big enough for a void* pointer from userspace ...

That's why I'm happy with 14 reserved __u32. That should be enough for userspace pointers in the foreseable future :-)

-- Regards, Laurent Pinchart

Subash Patel

2:59 p.m.

New subject: [RFCv2 PATCH 4/9] v4l: add buffer exporting via dmabuf

On 03/22/2012 07:37 PM, Laurent Pinchart wrote:

...

Hi Subash,

On Thursday 22 March 2012 19:27:01 Subash Patel wrote:

...
On 03/22/2012 04:46 PM, Laurent Pinchart wrote:

...
On Tuesday 13 March 2012 11:17:02 Tomasz Stanislawski wrote:

[snip]

...
...
...
diff --git a/include/linux/videodev2.h b/include/linux/videodev2.h index bb6844e..e71c787 100644 --- a/include/linux/videodev2.h +++ b/include/linux/videodev2.h @@ -680,6 +680,25 @@ struct v4l2_buffer {

#define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 0x0800 #define V4L2_BUF_FLAG_NO_CACHE_CLEAN 0x1000

+/**

struct v4l2_exportbuffer - export of video buffer as DMABUF file

descriptor

@fd: file descriptor associated with DMABUF (set by driver)

@mem_offset: for non-multiplanar buffers with memory ==

V4L2_MEMORY_MMAP;

I don't think we will ever support exporting anything else than V4L2_MEMORY_MMAP buffers. What will happen for multiplanar buffers ?

...
offset from the start of the device memory for this plane,
(or a "cookie" that should be passed to mmap() as offset)
Shouldn't the mem_offset field always be set to the mmap cookie value ? I'm a bit confused by the "or" part, it seems to have been copied from the v4l2_buffer documentation directly. We should clarify that.

...

Contains data used for exporting a video buffer as DMABUF file

descriptor. Uses the same 'cookie' as mmap() syscall. All reserved

fields

must be set to zero.

*/

+struct v4l2_exportbuffer {

__u32 fd;

__u32 reserved0;

Why is there a reserved field here ?
+1 to Laurent. Any particular need for reserved0 and reserved[13] below? I think one void user pointer is sufficient even for future need.
I'd rather have more than one void user pointer, just in case. A couple of bytes won't be expensive,

Just an off-topic note. When I learnt to hit keyboard for programming(in linux for embedded), I had strict guidelines not to declare variables as I like, as memory and computing was very precious then. A decade later, people no more think its expensive to keep 14*3 bytes*(who knows how many dma_buf objects) in the system. Just a side note, thats all :)

and they will save lots of hassle in the future if

...

we need to add a couple of fields. I was just wondering why there was a reserved field between fd and mem_offset.

...
...
...

__u32 mem_offset;

__u32 reserved[13];

+};

Also, what is the reason for returning the fd through this structure? To keep it aligned with other v4l2 calls? I liked(or now hate making change in the app) how it was being returned through the ioctl in your last patch.

Probably to be consistent with the V4L2 API, yes. It won't make a big difference for applications, I would favor consistency in this case.

Laurent Pinchart

3:09 p.m.

New subject: [RFCv2 PATCH 4/9] v4l: add buffer exporting via dmabuf

Hi Subash,

On Thursday 22 March 2012 20:29:57 Subash Patel wrote:

...

On 03/22/2012 07:37 PM, Laurent Pinchart wrote:

...
On Thursday 22 March 2012 19:27:01 Subash Patel wrote:

...
On 03/22/2012 04:46 PM, Laurent Pinchart wrote:

...
On Tuesday 13 March 2012 11:17:02 Tomasz Stanislawski wrote:

[snip]

...
...
...
diff --git a/include/linux/videodev2.h b/include/linux/videodev2.h index bb6844e..e71c787 100644 --- a/include/linux/videodev2.h +++ b/include/linux/videodev2.h @@ -680,6 +680,25 @@ struct v4l2_buffer {

#define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 0x0800 #define V4L2_BUF_FLAG_NO_CACHE_CLEAN 0x1000

+/**

struct v4l2_exportbuffer - export of video buffer as DMABUF file

descriptor

@fd: file descriptor associated with DMABUF (set by driver)

@mem_offset: for non-multiplanar buffers with memory ==

V4L2_MEMORY_MMAP;

I don't think we will ever support exporting anything else than V4L2_MEMORY_MMAP buffers. What will happen for multiplanar buffers ?

...
offset from the start of the device memory for this plane,
(or a "cookie" that should be passed to mmap() as offset)
Shouldn't the mem_offset field always be set to the mmap cookie value ? I'm a bit confused by the "or" part, it seems to have been copied from the v4l2_buffer documentation directly. We should clarify that.

...

Contains data used for exporting a video buffer as DMABUF file

descriptor. Uses the same 'cookie' as mmap() syscall. All reserved

fields

must be set to zero.

*/

+struct v4l2_exportbuffer {

__u32 fd;

__u32 reserved0;

Why is there a reserved field here ?
+1 to Laurent. Any particular need for reserved0 and reserved[13] below? I think one void user pointer is sufficient even for future need.
I'd rather have more than one void user pointer, just in case. A couple of bytes won't be expensive,
Just an off-topic note. When I learnt to hit keyboard for programming(in linux for embedded), I had strict guidelines not to declare variables as I like, as memory and computing was very precious then.

Somewhere on my TODO list is a Kalman filter implementation for a microcontroller with 512 bytes of RAM. I know what you mean :-)

...

A decade later, people no more think its expensive to keep 14*3 bytes*(who knows how many dma_buf objects) in the system. Just a side note, thats all :)

For objects that will exist in many instances, saving memory is important (you would need to be really convincing to add a single bit to struct page for instance), but struct v4l2_exportbuffer is only used to hold parameters for an ioctl. That's temporary memory, so I think we can spare a couple of extra bytes if it can help with ABI stability.

...

...
and they will save lots of hassle in the future if we need to add a couple of fields. I was just wondering why there was a reserved field between fd and mem_offset.

...
...
...

__u32 mem_offset;

__u32 reserved[13];

+};

Also, what is the reason for returning the fd through this structure? To keep it aligned with other v4l2 calls? I liked(or now hate making change in the app) how it was being returned through the ioctl in your last patch.

Probably to be consistent with the V4L2 API, yes. It won't make a big difference for applications, I would favor consistency in this case.

-- Regards, Laurent Pinchart

Tomasz Stanislawski

23 Mar 23 Mar

11:33 a.m.

New subject: [RFCv2 PATCH 4/9] v4l: add buffer exporting via dmabuf

Hi Laurent, Please refer to the comments below.

On 03/22/2012 12:16 PM, Laurent Pinchart wrote:

...

Hi Tomasz,

Thanks for the patch.

On Tuesday 13 March 2012 11:17:02 Tomasz Stanislawski wrote:

...
This patch adds extension to V4L2 api. It allow to export a mmap buffer as file descriptor. New ioctl VIDIOC_EXPBUF is added. It takes a buffer offset used by mmap and return a file descriptor on success.

I know code is more fun to write than documentation, but Documentation/DocBook/media/v4l will be sad if this patch is merged as-is ;-)

Ok. I will prepare the documentation just after there will be consensus about shape of buffer exporting API :).

...

...
Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com

drivers/media/video/v4l2-compat-ioctl32.c | 1 + drivers/media/video/v4l2-ioctl.c | 11 +++++++++++ include/linux/videodev2.h | 20 ++++++++++++++++++++ include/media/v4l2-ioctl.h | 2 ++ 4 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/v4l2-compat-ioctl32.c b/drivers/media/video/v4l2-compat-ioctl32.c index e6f67aa..fd157cb 100644 --- a/drivers/media/video/v4l2-compat-ioctl32.c +++ b/drivers/media/video/v4l2-compat-ioctl32.c @@ -954,6 +954,7 @@ long v4l2_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg) case VIDIOC_S_FBUF32: case VIDIOC_OVERLAY32: case VIDIOC_QBUF32:

case VIDIOC_EXPBUF: case VIDIOC_DQBUF32: case VIDIOC_STREAMON32: case VIDIOC_STREAMOFF32:

diff --git a/drivers/media/video/v4l2-ioctl.c b/drivers/media/video/v4l2-ioctl.c index 74cab51..a125016 100644 --- a/drivers/media/video/v4l2-ioctl.c +++ b/drivers/media/video/v4l2-ioctl.c @@ -207,6 +207,7 @@ static const char *v4l2_ioctls[] = { [_IOC_NR(VIDIOC_S_FBUF)] = "VIDIOC_S_FBUF", [_IOC_NR(VIDIOC_OVERLAY)] = "VIDIOC_OVERLAY", [_IOC_NR(VIDIOC_QBUF)] = "VIDIOC_QBUF",

[_IOC_NR(VIDIOC_EXPBUF)] = "VIDIOC_EXPBUF", [_IOC_NR(VIDIOC_DQBUF)] = "VIDIOC_DQBUF", [_IOC_NR(VIDIOC_STREAMON)] = "VIDIOC_STREAMON", [_IOC_NR(VIDIOC_STREAMOFF)] = "VIDIOC_STREAMOFF",

@@ -938,6 +939,16 @@ static long __video_do_ioctl(struct file *file, dbgbuf(cmd, vfd, p); break; }
case VIDIOC_EXPBUF:

{
struct v4l2_exportbuffer *p = arg;
if (!ops->vidioc_expbuf)
	break;
ret = ops->vidioc_expbuf(file, fh, p);
You can pass arg to ops->vidioc_expbuf() directly, there's no need to create a struct v4l2_exportbuffer *p variable.

No problem. I tried to follow style of other ioctls. Notice that adding this temporary variable provides some form of type checking. I mean using a proper structure for a proper callback.

...

...
break;
} case VIDIOC_DQBUF: { struct v4l2_buffer *p = arg;
diff --git a/include/linux/videodev2.h b/include/linux/videodev2.h index bb6844e..e71c787 100644 --- a/include/linux/videodev2.h +++ b/include/linux/videodev2.h @@ -680,6 +680,25 @@ struct v4l2_buffer { #define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 0x0800 #define V4L2_BUF_FLAG_NO_CACHE_CLEAN 0x1000

+/**

struct v4l2_exportbuffer - export of video buffer as DMABUF file

descriptor

@fd: file descriptor associated with DMABUF (set by driver)

@mem_offset: for non-multiplanar buffers with memory ==

V4L2_MEMORY_MMAP;
I don't think we will ever support exporting anything else than V4L2_MEMORY_MMAP buffers. What will happen for multiplanar buffers ?

Every plane is described by its separate mem-offset. So planes are exported as separate DMABUF. I will update field description.

...

...
offset from the start of the device memory for this plane,
(or a "cookie" that should be passed to mmap() as offset)
Shouldn't the mem_offset field always be set to the mmap cookie value ? I'm a bit confused by the "or" part, it seems to have been copied from the v4l2_buffer documentation directly. We should clarify that.

Ok. I agree.

...

...

Contains data used for exporting a video buffer as DMABUF file

descriptor. Uses the same 'cookie' as mmap() syscall. All reserved

fields

must be set to zero.

*/

+struct v4l2_exportbuffer {

__u32 fd;

__u32 reserved0;

Why is there a reserved field here ?

I expected that struct v4l2_requestbuffer might allow exporting buffers described by something else than 'mmap cookie'. Union could be used for different schemes i.e.

struct v4l2_exportbuffer { __u32 fd; __u32 type; union { __u32 mem_offset; __u32 reserved[14]; /* other descriptions */ } m; };

Adding reserved0 field will allow to introduce field type, and keep the union at 64-bit aligned address to keep compatibility between 32/64bit libraries. Field type == 0 would indicate description of a buffer using 'mmap offset'.

...

...
__u32 mem_offset;

__u32 reserved[13];

+};

/*

O V E R L A Y P R E V I E W

*/ @@ -2303,6 +2322,7 @@ struct v4l2_create_buffers { #define VIDIOC_S_FBUF _IOW('V', 11, struct v4l2_framebuffer) #define VIDIOC_OVERLAY _IOW('V', 14, int) #define VIDIOC_QBUF _IOWR('V', 15, struct v4l2_buffer) +#define VIDIOC_EXPBUF _IOWR('V', 16, struct v4l2_exportbuffer) #define VIDIOC_DQBUF _IOWR('V', 17, struct v4l2_buffer) #define VIDIOC_STREAMON _IOW('V', 18, int) #define VIDIOC_STREAMOFF _IOW('V', 19, int) diff --git a/include/media/v4l2-ioctl.h b/include/media/v4l2-ioctl.h index 4df031a..d8716c6f 100644 --- a/include/media/v4l2-ioctl.h +++ b/include/media/v4l2-ioctl.h @@ -120,6 +120,8 @@ struct v4l2_ioctl_ops { int (*vidioc_reqbufs) (struct file *file, void *fh, struct v4l2_requestbuffers *b); int (*vidioc_querybuf)(struct file *file, void *fh, struct v4l2_buffer *b); int (*vidioc_qbuf) (struct file *file, void *fh, struct v4l2_buffer *b);
int (*vidioc_expbuf) (struct file *file, void *fh,
		struct v4l2_exportbuffer *e);
int (*vidioc_dqbuf) (struct file *file, void *fh, struct v4l2_buffer
*b);

int (*vidioc_create_bufs)(struct file *file, void *fh, struct v4l2_create_buffers *b);

Regards, Tomasz Stanislawski

Laurent Pinchart

27 Mar 27 Mar

10:21 a.m.

New subject: [RFCv2 PATCH 4/9] v4l: add buffer exporting via dmabuf

Hi Tomasz,

On Friday 23 March 2012 12:33:24 Tomasz Stanislawski wrote:

...

On 03/22/2012 12:16 PM, Laurent Pinchart wrote:

...
On Tuesday 13 March 2012 11:17:02 Tomasz Stanislawski wrote:

[snip]

...

...
...
case VIDIOC_EXPBUF:

{
struct v4l2_exportbuffer *p = arg;
if (!ops->vidioc_expbuf)
	break;
ret = ops->vidioc_expbuf(file, fh, p);
You can pass arg to ops->vidioc_expbuf() directly, there's no need to create a struct v4l2_exportbuffer *p variable.
No problem. I tried to follow style of other ioctls. Notice that adding this temporary variable provides some form of type checking. I mean using a proper structure for a proper callback.

It makes sure that the argument passed to video_expbuf is indeed a v4l2_exportbuffer, but it doesn't check that the arg pointer you assign to p points to the right type, so it's a bit pointless in my opinion.

This construct makes sense if you need to access field of the arg pointer here before or after calling the operation.

...

...
...
break;
}

-- Regards, Laurent Pinchart

Tomasz Stanislawski

13 Mar 13 Mar

10:17 a.m.

New subject: [RFCv2 PATCH 5/9] v4l: vb2: add buffer exporting via dmabuf

This patch adds extension to videobuf2-core. It allow to export a mmap buffer as a file descriptor.

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- drivers/media/video/videobuf2-core.c | 64 ++++++++++++++++++++++++++++++++++ include/media/videobuf2-core.h | 2 + 2 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/videobuf2-core.c b/drivers/media/video/videobuf2-core.c index e7df560..41c4bf8 100644 --- a/drivers/media/video/videobuf2-core.c +++ b/drivers/media/video/videobuf2-core.c @@ -1553,6 +1553,70 @@ int vb2_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool nonblocking) } EXPORT_SYMBOL_GPL(vb2_dqbuf);

+static int __find_plane_by_offset(struct vb2_queue *q, unsigned long off, + unsigned int *_buffer, unsigned int *_plane); + +/** + * vb2_expbuf() - Export a buffer as a file descriptor + * @q: videobuf2 queue + * @b: export buffer structure passed from userspace to vidioc_expbuf + * handler in driver + * + * The return values from this function are intended to be directly returned + * from vidioc_expbuf handler in driver. + */ +int vb2_expbuf(struct vb2_queue *q, struct v4l2_exportbuffer *eb) +{ + struct vb2_buffer *vb = NULL; + struct vb2_plane *vb_plane; + unsigned int buffer, plane; + int ret; + struct dma_buf *dbuf; + + if (q->memory != V4L2_MEMORY_MMAP) { + dprintk(1, "Queue is not currently set up for mmap\n"); + return -EINVAL; + } + + if (!q->mem_ops->get_dmabuf) { + dprintk(1, "Queue does not support DMA buffer exporting\n"); + return -EINVAL; + } + + /* + * Find the plane corresponding to the offset passed by userspace. + */ + ret = __find_plane_by_offset(q, eb->mem_offset, &buffer, &plane); + if (ret) { + dprintk(1, "invalid offset %u\n", eb->mem_offset); + return ret; + } + + vb = q->bufs[buffer]; + vb_plane = &vb->planes[plane]; + + dbuf = call_memop(q, get_dmabuf, vb_plane->mem_priv); + if (IS_ERR_OR_NULL(dbuf)) { + dprintk(1, "Failed to export buffer %d, plane %d\n", + buffer, plane); + return -EINVAL; + } + + ret = dma_buf_fd(dbuf); + if (ret < 0) { + dprintk(3, "buffer %d, plane %d failed to export (%d)\n", + buffer, plane, ret); + return ret; + } + + dprintk(3, "buffer %d, plane %d exported as %d descriptor\n", + buffer, plane, ret); + eb->fd = ret; + + return 0; +} +EXPORT_SYMBOL_GPL(vb2_expbuf); + /** * __vb2_queue_cancel() - cancel and stop (pause) streaming * diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h index 412c6a4..548252b 100644 --- a/include/media/videobuf2-core.h +++ b/include/media/videobuf2-core.h @@ -79,6 +79,7 @@ struct vb2_mem_ops { void (*prepare)(void *buf_priv); void (*finish)(void *buf_priv); void (*put)(void *buf_priv); + struct dma_buf *(*get_dmabuf)(void *buf_priv);

void *(*get_userptr)(void *alloc_ctx, unsigned long vaddr, unsigned long size, int write); @@ -348,6 +349,7 @@ int vb2_queue_init(struct vb2_queue *q); void vb2_queue_release(struct vb2_queue *q);

int vb2_qbuf(struct vb2_queue *q, struct v4l2_buffer *b); +int vb2_expbuf(struct vb2_queue *q, struct v4l2_exportbuffer *eb); int vb2_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool nonblocking);

int vb2_streamon(struct vb2_queue *q, enum v4l2_buf_type type);

-- 1.7.5.4

Laurent Pinchart

22 Mar 22 Mar

11:24 a.m.

New subject: [RFCv2 PATCH 5/9] v4l: vb2: add buffer exporting via dmabuf

Hi Tomasz,

Thanks for the patch.

On Tuesday 13 March 2012 11:17:03 Tomasz Stanislawski wrote:

...

This patch adds extension to videobuf2-core. It allow to export a mmap buffer as a file descriptor.

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com

drivers/media/video/videobuf2-core.c | 64 ++++++++++++++++++++++++++++++++++ include/media/videobuf2-core.h | 2 + 2 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/videobuf2-core.c b/drivers/media/video/videobuf2-core.c index e7df560..41c4bf8 100644 --- a/drivers/media/video/videobuf2-core.c +++ b/drivers/media/video/videobuf2-core.c @@ -1553,6 +1553,70 @@ int vb2_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool nonblocking) } EXPORT_SYMBOL_GPL(vb2_dqbuf);

+static int __find_plane_by_offset(struct vb2_queue *q, unsigned long off,
	unsigned int *_buffer, unsigned int *_plane);

Could you please move __find_plane_by_offset() up or move vb2_expbuf() down to avoid the forward declaration ? The later might make more sense, you could declare vb2_expbuf() right after vb2_mmap() (here and in videobuf2-core.h), both functions perform similar tasks.

...

+/**
vb2_expbuf() - Export a buffer as a file descriptor

@q: videobuf2 queue

@b: export buffer structure passed from userspace to vidioc_expbuf
handler in driver
The return values from this function are intended to be directly
returned

from vidioc_expbuf handler in driver.

*/

+int vb2_expbuf(struct vb2_queue *q, struct v4l2_exportbuffer *eb) +{
struct vb2_buffer *vb = NULL;

struct vb2_plane *vb_plane;

unsigned int buffer, plane;

int ret;

struct dma_buf *dbuf;

if (q->memory != V4L2_MEMORY_MMAP) {
dprintk(1, "Queue is not currently set up for mmap\n");
return -EINVAL;
}

if (!q->mem_ops->get_dmabuf) {
dprintk(1, "Queue does not support DMA buffer exporting\n");
return -EINVAL;
}

/*
* Find the plane corresponding to the offset passed by userspace.
*/
ret = __find_plane_by_offset(q, eb->mem_offset, &buffer, &plane);

if (ret) {
dprintk(1, "invalid offset %u\n", eb->mem_offset);
return ret;
}

vb = q->bufs[buffer];

vb_plane = &vb->planes[plane];

dbuf = call_memop(q, get_dmabuf, vb_plane->mem_priv);

if (IS_ERR_OR_NULL(dbuf)) {
dprintk(1, "Failed to export buffer %d, plane %d\n",
	buffer, plane);
return -EINVAL;
}

ret = dma_buf_fd(dbuf);

if (ret < 0) {
dprintk(3, "buffer %d, plane %d failed to export (%d)\n",
	buffer, plane, ret);
return ret;
}

dprintk(3, "buffer %d, plane %d exported as %d descriptor\n",
buffer, plane, ret);
eb->fd = ret;

return 0;
+} +EXPORT_SYMBOL_GPL(vb2_expbuf);

/**

__vb2_queue_cancel() - cancel and stop (pause) streaming

diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h index 412c6a4..548252b 100644 --- a/include/media/videobuf2-core.h +++ b/include/media/videobuf2-core.h @@ -79,6 +79,7 @@ struct vb2_mem_ops { void (*prepare)(void *buf_priv); void (*finish)(void *buf_priv); void (*put)(void *buf_priv);

struct dma_buf *(*get_dmabuf)(void *buf_priv);

void *(*get_userptr)(void *alloc_ctx, unsigned long vaddr, unsigned long size, int write);

@@ -348,6 +349,7 @@ int vb2_queue_init(struct vb2_queue *q); void vb2_queue_release(struct vb2_queue *q);

int vb2_qbuf(struct vb2_queue *q, struct v4l2_buffer *b); +int vb2_expbuf(struct vb2_queue *q, struct v4l2_exportbuffer *eb); int vb2_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool nonblocking);

int vb2_streamon(struct vb2_queue *q, enum v4l2_buf_type type);

-- Regards, Laurent Pinchart

Tomasz Stanislawski

23 Mar 23 Mar

11:50 a.m.

New subject: [RFCv2 PATCH 5/9] v4l: vb2: add buffer exporting via dmabuf

Hi Laurent, Thank you for you comments.

On 03/22/2012 12:24 PM, Laurent Pinchart wrote:

...

Hi Tomasz,

Thanks for the patch.

On Tuesday 13 March 2012 11:17:03 Tomasz Stanislawski wrote:

...
This patch adds extension to videobuf2-core. It allow to export a mmap buffer as a file descriptor.

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com

drivers/media/video/videobuf2-core.c | 64 ++++++++++++++++++++++++++++++++++ include/media/videobuf2-core.h | 2 + 2 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/videobuf2-core.c b/drivers/media/video/videobuf2-core.c index e7df560..41c4bf8 100644 --- a/drivers/media/video/videobuf2-core.c +++ b/drivers/media/video/videobuf2-core.c @@ -1553,6 +1553,70 @@ int vb2_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool nonblocking) } EXPORT_SYMBOL_GPL(vb2_dqbuf);

+static int __find_plane_by_offset(struct vb2_queue *q, unsigned long off,
	unsigned int *_buffer, unsigned int *_plane);
Could you please move __find_plane_by_offset() up or move vb2_expbuf() down to avoid the forward declaration ? The later might make more sense, you could declare vb2_expbuf() right after vb2_mmap() (here and in videobuf2-core.h), both functions perform similar tasks.

Ok. I will move it.

I used the forward declaration to have only-plus patch while keeping all vb2_*buf functions together.

Regards, Tomasz Stanislawski

Tomasz Stanislawski

13 Mar 13 Mar

10:17 a.m.

New subject: [RFCv2 PATCH 6/9] v4l: vb2-dma-contig: add support for DMABUF exporting

This patch adds support for exporting a dma-contig buffer using DMABUF interface.

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- drivers/media/video/videobuf2-dma-contig.c | 128 ++++++++++++++++++++++++++++ 1 files changed, 128 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index 746dd5f..d95b23a 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -31,6 +31,7 @@ struct vb2_dc_buf { /* MMAP related */ struct vb2_vmarea_handler handler; atomic_t refcount; + struct dma_buf *dma_buf; struct sg_table *sgt_base;

/* USERPTR related */ @@ -194,6 +195,8 @@ static void vb2_dc_put(void *buf_priv) if (!atomic_dec_and_test(&buf->refcount)) return;

+ if (buf->dma_buf) + dma_buf_put(buf->dma_buf); vb2_dc_release_sgtable(buf->sgt_base); dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr); @@ -309,6 +312,130 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma) }

/*********************************************/ +/* DMABUF ops for exporters */ +/*********************************************/ + +struct vb2_dc_attachment { + struct sg_table sgt; + enum dma_data_direction dir; +}; + +static int vb2_dc_dmabuf_ops_attach(struct dma_buf *dbuf, struct device *dev, + struct dma_buf_attachment *dbuf_attach) +{ + /* nothing to be done */ + return 0; +} + +static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf, + struct dma_buf_attachment *db_attach) +{ + struct vb2_dc_attachment *attach = db_attach->priv; + struct sg_table *sgt; + + if (!attach) + return; + + sgt = &attach->sgt; + + dma_unmap_sg(db_attach->dev, sgt->sgl, sgt->nents, attach->dir); + sg_free_table(sgt); + kfree(attach); + db_attach->priv = NULL; +} + +static struct sg_table *vb2_dc_dmabuf_ops_map( + struct dma_buf_attachment *db_attach, enum dma_data_direction dir) +{ + struct dma_buf *dbuf = db_attach->dmabuf; + struct vb2_dc_buf *buf = dbuf->priv; + struct vb2_dc_attachment *attach = db_attach->priv; + struct sg_table *sgt; + struct scatterlist *rd, *wr; + int i, ret; + + /* return previously mapped sg table */ + if (attach) + return &attach->sgt; + + attach = kzalloc(sizeof *attach, GFP_KERNEL); + if (!attach) + return ERR_PTR(-ENOMEM); + + sgt = &attach->sgt; + attach->dir = dir; + + /* copying the buf->base_sgt to attachment */ + ret = sg_alloc_table(sgt, buf->sgt_base->orig_nents, GFP_KERNEL); + if (ret) { + kfree(attach); + return ERR_PTR(-ENOMEM); + } + + rd = buf->sgt_base->sgl; + wr = sgt->sgl; + for (i = 0; i < sgt->orig_nents; ++i) { + sg_set_page(wr, sg_page(rd), rd->length, rd->offset); + rd = sg_next(rd); + wr = sg_next(wr); + } + + /* mapping new sglist to the client */ + ret = dma_map_sg(db_attach->dev, sgt->sgl, sgt->orig_nents, dir); + if (ret <= 0) { + printk(KERN_ERR "failed to map scatterlist\n"); + sg_free_table(sgt); + kfree(attach); + return ERR_PTR(-EIO); + } + + db_attach->priv = attach; + + return sgt; +} + +static void vb2_dc_dmabuf_ops_unmap(struct dma_buf_attachment *db_attach, + struct sg_table *sgt) +{ + /* nothing to be done here */ +} + +static void vb2_dc_dmabuf_ops_release(struct dma_buf *dbuf) +{ + /* drop reference obtained in vb2_dc_get_dmabuf */ + vb2_dc_put(dbuf->priv); +} + +static struct dma_buf_ops vb2_dc_dmabuf_ops = { + .attach = vb2_dc_dmabuf_ops_attach, + .detach = vb2_dc_dmabuf_ops_detach, + .map_dma_buf = vb2_dc_dmabuf_ops_map, + .unmap_dma_buf = vb2_dc_dmabuf_ops_unmap, + .release = vb2_dc_dmabuf_ops_release, +}; + +static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv) +{ + struct vb2_dc_buf *buf = buf_priv; + struct dma_buf *dbuf; + + if (buf->dma_buf) + return buf->dma_buf; + + /* dmabuf keeps reference to vb2 buffer */ + atomic_inc(&buf->refcount); + dbuf = dma_buf_export(buf, &vb2_dc_dmabuf_ops, buf->size, 0); + if (IS_ERR(dbuf)) { + atomic_dec(&buf->refcount); + return NULL; + } + + buf->dma_buf = dbuf; + + return dbuf; +} + +/*********************************************/ /* callbacks for USERPTR buffers */ /*********************************************/

@@ -603,6 +730,7 @@ static void *vb2_dc_attach_dmabuf(void *alloc_ctx, struct dma_buf *dbuf, const struct vb2_mem_ops vb2_dma_contig_memops = { .alloc = vb2_dc_alloc, .put = vb2_dc_put, + .get_dmabuf = vb2_dc_get_dmabuf, .cookie = vb2_dc_cookie, .vaddr = vb2_dc_vaddr, .mmap = vb2_dc_mmap,

-- 1.7.5.4

Tomasz Stanislawski

10:17 a.m.

New subject: [RFCv2 PATCH 7/9] v4l: vb2-dma-contig: change map/unmap behaviour

The DMABUF documentation says that the map_dma_buf callback should return scatterlist that is mapped into a caller's address space. In practice, almost none of existing implementations of DMABUF exporter does it. This patch breaks the DMABUF specification in order to allow exchange DMABUF buffers between other APIs like DRM.

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- drivers/media/video/videobuf2-dma-contig.c | 64 ++++++++++++---------------- 1 files changed, 27 insertions(+), 37 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index d95b23a..32bb16b 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -315,11 +315,6 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma) /* DMABUF ops for exporters */ /*********************************************/

-struct vb2_dc_attachment { - struct sg_table sgt; - enum dma_data_direction dir; -}; - static int vb2_dc_dmabuf_ops_attach(struct dma_buf *dbuf, struct device *dev, struct dma_buf_attachment *dbuf_attach) { @@ -330,17 +325,13 @@ static int vb2_dc_dmabuf_ops_attach(struct dma_buf *dbuf, struct device *dev, static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf, struct dma_buf_attachment *db_attach) { - struct vb2_dc_attachment *attach = db_attach->priv; - struct sg_table *sgt; + struct sg_table *sgt = db_attach->priv;

- if (!attach) + if (!sgt) return;

- sgt = &attach->sgt; - - dma_unmap_sg(db_attach->dev, sgt->sgl, sgt->nents, attach->dir); sg_free_table(sgt); - kfree(attach); + kfree(sgt); db_attach->priv = NULL; }

@@ -349,26 +340,22 @@ static struct sg_table *vb2_dc_dmabuf_ops_map( { struct dma_buf *dbuf = db_attach->dmabuf; struct vb2_dc_buf *buf = dbuf->priv; - struct vb2_dc_attachment *attach = db_attach->priv; - struct sg_table *sgt; + struct sg_table *sgt = db_attach->priv; struct scatterlist *rd, *wr; int i, ret;

/* return previously mapped sg table */ - if (attach) - return &attach->sgt; + if (sgt) + return sgt;

- attach = kzalloc(sizeof *attach, GFP_KERNEL); - if (!attach) + sgt = kzalloc(sizeof *sgt, GFP_KERNEL); + if (!sgt) return ERR_PTR(-ENOMEM);

- sgt = &attach->sgt; - attach->dir = dir; - /* copying the buf->base_sgt to attachment */ ret = sg_alloc_table(sgt, buf->sgt_base->orig_nents, GFP_KERNEL); if (ret) { - kfree(attach); + kfree(sgt); return ERR_PTR(-ENOMEM); }

@@ -380,16 +367,7 @@ static struct sg_table *vb2_dc_dmabuf_ops_map( wr = sg_next(wr); }

- /* mapping new sglist to the client */ - ret = dma_map_sg(db_attach->dev, sgt->sgl, sgt->orig_nents, dir); - if (ret <= 0) { - printk(KERN_ERR "failed to map scatterlist\n"); - sg_free_table(sgt); - kfree(attach); - return ERR_PTR(-EIO); - } - - db_attach->priv = attach; + db_attach->priv = sgt;

return sgt; } @@ -623,7 +601,7 @@ static int vb2_dc_map_dmabuf(void *mem_priv) struct vb2_dc_buf *buf = mem_priv; struct sg_table *sgt; unsigned long contig_size; - int ret = 0; + int ret = -EFAULT;

if (WARN_ON(!buf->db_attach)) { printk(KERN_ERR "trying to pin a non attached buffer\n"); @@ -642,12 +620,20 @@ static int vb2_dc_map_dmabuf(void *mem_priv) return -EINVAL; }

+ /* mapping new sglist to the client */ + sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents, + buf->dma_dir); + if (sgt->nents <= 0) { + printk(KERN_ERR "failed to map scatterlist\n"); + goto fail_map_attachment; + } + /* checking if dmabuf is big enough to store contiguous chunk */ contig_size = vb2_dc_get_contiguous_size(sgt); if (contig_size < buf->size) { - printk(KERN_ERR "contiguous chunk of dmabuf is too small\n"); - ret = -EFAULT; - goto fail_map; + printk(KERN_ERR "contiguous chunk of dmabuf is too small " + "%lu/%lu bytes\n", contig_size, buf->size); + goto fail_map_sg; }

buf->dma_addr = sg_dma_address(sgt->sgl); @@ -655,7 +641,10 @@ static int vb2_dc_map_dmabuf(void *mem_priv)

return 0;

-fail_map: +fail_map_sg: + dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); + +fail_map_attachment: dma_buf_unmap_attachment(buf->db_attach, sgt);

return ret; @@ -676,6 +665,7 @@ static void vb2_dc_unmap_dmabuf(void *mem_priv) return; }

+ dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); dma_buf_unmap_attachment(buf->db_attach, sgt);

buf->dma_addr = 0;

-- 1.7.5.4

Laurent Pinchart

22 Mar 22 Mar

12:15 p.m.

New subject: [RFCv2 PATCH 7/9] v4l: vb2-dma-contig: change map/unmap behaviour

Hi Tomasz,

Thanks for the patch.

On Tuesday 13 March 2012 11:17:05 Tomasz Stanislawski wrote:

...

The DMABUF documentation says that the map_dma_buf callback should return scatterlist that is mapped into a caller's address space. In practice, almost none of existing implementations of DMABUF exporter does it. This patch breaks the DMABUF specification in order to allow exchange DMABUF buffers between other APIs like DRM.

Then it's time to fix the spec, and squash 6/9 and 7/9 together (I started reviewing 6/9 and the implementation puzzled me until I saw the "fixes" in 7/9).

We need to agree on a behaviour for the mapping API. I've thought from the beginning that mapping the buffer to the importer's device address space should be the responsibility of the importer, not the exporter. If we move to that approach, we should probably rename the map and unmap functions as they won't deal with mappings anymore. I recall that pin/unpin were proposed at some point (possible during a meeting at the ELC).

One possible issue with handling mappings in the importer is that the exporter won't be able to implement any mapping/unmapping hack that might be needed. As the DMA SG API doesn't seem to support VM_PFNMAP memory (see the explanation in my reply to 2/9), we will be left without a solution if an exporter uses memory not backed with struct page.

...

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com

drivers/media/video/videobuf2-dma-contig.c | 64 ++++++++++++---------------- 1 files changed, 27 insertions(+), 37 deletions(-)

diff --git a/drivers/media/video/videobuf2-dma-contig.c b/drivers/media/video/videobuf2-dma-contig.c index d95b23a..32bb16b 100644 --- a/drivers/media/video/videobuf2-dma-contig.c +++ b/drivers/media/video/videobuf2-dma-contig.c @@ -315,11 +315,6 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma) /* DMABUF ops for exporters */ /*********************************************/

-struct vb2_dc_attachment {

struct sg_table sgt;

enum dma_data_direction dir;

-};

static int vb2_dc_dmabuf_ops_attach(struct dma_buf *dbuf, struct device *dev, struct dma_buf_attachment *dbuf_attach) { @@ -330,17 +325,13 @@ static int vb2_dc_dmabuf_ops_attach(struct dma_buf *dbuf, struct device *dev, static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf, struct dma_buf_attachment *db_attach) {

struct vb2_dc_attachment *attach = db_attach->priv;

struct sg_table *sgt;

struct sg_table *sgt = db_attach->priv;

if (!attach)

if (!sgt) return;

sgt = &attach->sgt;

dma_unmap_sg(db_attach->dev, sgt->sgl, sgt->nents, attach->dir); sg_free_table(sgt);

kfree(attach);

kfree(sgt); db_attach->priv = NULL;

}

@@ -349,26 +340,22 @@ static struct sg_table *vb2_dc_dmabuf_ops_map( { struct dma_buf *dbuf = db_attach->dmabuf; struct vb2_dc_buf *buf = dbuf->priv;

struct vb2_dc_attachment *attach = db_attach->priv;

struct sg_table *sgt;

struct sg_table *sgt = db_attach->priv; struct scatterlist *rd, *wr; int i, ret;

/* return previously mapped sg table */
if (attach)
return &attach->sgt;
if (sgt)
return sgt;
attach = kzalloc(sizeof *attach, GFP_KERNEL);

if (!attach)

sgt = kzalloc(sizeof *sgt, GFP_KERNEL);

if (!sgt) return ERR_PTR(-ENOMEM);
sgt = &attach->sgt;

attach->dir = dir;

/* copying the buf->base_sgt to attachment */ ret = sg_alloc_table(sgt, buf->sgt_base->orig_nents, GFP_KERNEL); if (ret) {
kfree(attach);
kfree(sgt);
return ERR_PTR(-ENOMEM); }
@@ -380,16 +367,7 @@ static struct sg_table *vb2_dc_dmabuf_ops_map( wr = sg_next(wr); }
/* mapping new sglist to the client */

ret = dma_map_sg(db_attach->dev, sgt->sgl, sgt->orig_nents, dir);

if (ret <= 0) {
printk(KERN_ERR "failed to map scatterlist\n");
sg_free_table(sgt);
kfree(attach);
return ERR_PTR(-EIO);
}

db_attach->priv = attach;
db_attach->priv = sgt;

return sgt;

} @@ -623,7 +601,7 @@ static int vb2_dc_map_dmabuf(void *mem_priv) struct vb2_dc_buf *buf = mem_priv; struct sg_table *sgt; unsigned long contig_size;

int ret = 0;

int ret = -EFAULT;

if (WARN_ON(!buf->db_attach)) { printk(KERN_ERR "trying to pin a non attached buffer\n");

@@ -642,12 +620,20 @@ static int vb2_dc_map_dmabuf(void *mem_priv) return -EINVAL; }
/* mapping new sglist to the client */

sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents,
buf->dma_dir);
if (sgt->nents <= 0) {
printk(KERN_ERR "failed to map scatterlist\n");
goto fail_map_attachment;
}

/* checking if dmabuf is big enough to store contiguous chunk */ contig_size = vb2_dc_get_contiguous_size(sgt); if (contig_size < buf->size) {
printk(KERN_ERR "contiguous chunk of dmabuf is too small\n");
ret = -EFAULT;
goto fail_map;
printk(KERN_ERR "contiguous chunk of dmabuf is too small "
	"%lu/%lu bytes\n", contig_size, buf->size);
goto fail_map_sg;
}

buf->dma_addr = sg_dma_address(sgt->sgl);
@@ -655,7 +641,10 @@ static int vb2_dc_map_dmabuf(void *mem_priv)

return 0;

-fail_map: +fail_map_sg:

dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);

+fail_map_attachment: dma_buf_unmap_attachment(buf->db_attach, sgt);

return ret; @@ -676,6 +665,7 @@ static void vb2_dc_unmap_dmabuf(void *mem_priv) return; }

dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); dma_buf_unmap_attachment(buf->db_attach, sgt);

buf->dma_addr = 0;

-- Regards, Laurent Pinchart

Daniel Vetter

12:25 p.m.

New subject: [RFCv2 PATCH 7/9] v4l: vb2-dma-contig: change map/unmap behaviour

On Thu, Mar 22, 2012 at 13:15, Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:

...

On Tuesday 13 March 2012 11:17:05 Tomasz Stanislawski wrote:

...
The DMABUF documentation says that the map_dma_buf callback should return scatterlist that is mapped into a caller's address space. In practice, almost none of existing implementations of DMABUF exporter does it. This patch breaks the DMABUF specification in order to allow exchange DMABUF buffers between other APIs like DRM.

Then it's time to fix the spec, and squash 6/9 and 7/9 together (I started reviewing 6/9 and the implementation puzzled me until I saw the "fixes" in 7/9).

Nope. The drm proof of concept stuff that just grabbed the struct pages pointers from the sg_table has always just been a gross hack to get things of the ground. With proper kernel cpu access and mmap support we can ditch these, and Dave Airlie has already started with that:

http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2

Furthermore the kernel cpu access helpers are designed to just plug into the corresponding ttm helpers. It'll be slightly more messy for drm/i915 and udl because they don't use ttm.

And the afaik the proof of concept stuff from Rob Clark very much depends upon handing out addresses in the targets device address space. And there are other scenarios that just require this, besides that it makes imo more sense from an api design pov.

Yours, Daniel

-- Daniel Vetter daniel.vetter@ffwll.ch - +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Laurent Pinchart

27 Mar 27 Mar

10:31 a.m.

New subject: [RFCv2 PATCH 7/9] v4l: vb2-dma-contig: change map/unmap behaviour

Hi Daniel,

On Thursday 22 March 2012 13:25:20 Daniel Vetter wrote:

...

On Thu, Mar 22, 2012 at 13:15, Laurent Pinchart wrote:

...
On Tuesday 13 March 2012 11:17:05 Tomasz Stanislawski wrote:

...
The DMABUF documentation says that the map_dma_buf callback should return scatterlist that is mapped into a caller's address space. In practice, almost none of existing implementations of DMABUF exporter does it. This patch breaks the DMABUF specification in order to allow exchange DMABUF buffers between other APIs like DRM.

Then it's time to fix the spec, and squash 6/9 and 7/9 together (I started reviewing 6/9 and the implementation puzzled me until I saw the "fixes" in 7/9).

Nope. The drm proof of concept stuff that just grabbed the struct pages pointers from the sg_table has always just been a gross hack to get things of the ground. With proper kernel cpu access and mmap support we can ditch these, and Dave Airlie has already started with that:

http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2

Furthermore the kernel cpu access helpers are designed to just plug into the corresponding ttm helpers. It'll be slightly more messy for drm/i915 and udl because they don't use ttm.

And the afaik the proof of concept stuff from Rob Clark very much depends upon handing out addresses in the targets device address space. And there are other scenarios that just require this, besides that it makes imo more sense from an api design pov.

Let's continue this discussion in the "[Linaro-mm-sig] Minutes from V4L2 update call" mail thread if you don't mind to avoid scattering the topic all over. I've listed the two options there (mapping the buffer to the importer device's address space in the export driver or the importer driver).

-- Regards, Laurent Pinchart

Tomasz Stanislawski

13 Mar 13 Mar

10:17 a.m.

New subject: [RFCv2 PATCH 8/9] v4l: fimc: integrate capture i-face with dmabuf

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- drivers/media/video/Kconfig | 1 + drivers/media/video/s5p-fimc/fimc-capture.c | 11 ++++++++++- 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/media/video/Kconfig b/drivers/media/video/Kconfig index 9495b6a..c9963f0 100644 --- a/drivers/media/video/Kconfig +++ b/drivers/media/video/Kconfig @@ -1099,6 +1099,7 @@ config VIDEO_SAMSUNG_S5P_FIMC VIDEO_V4L2_SUBDEV_API && EXPERIMENTAL select VIDEOBUF2_DMA_CONTIG select V4L2_MEM2MEM_DEV + select DMA_SHARED_BUFFER ---help--- This is a v4l2 driver for Samsung S5P and EXYNOS4 camera host interface and video postprocessor. diff --git a/drivers/media/video/s5p-fimc/fimc-capture.c b/drivers/media/video/s5p-fimc/fimc-capture.c index a9e9653..7ecc36b 100644 --- a/drivers/media/video/s5p-fimc/fimc-capture.c +++ b/drivers/media/video/s5p-fimc/fimc-capture.c @@ -1011,6 +1011,14 @@ static int fimc_cap_qbuf(struct file *file, void *priv, return vb2_qbuf(&fimc->vid_cap.vbq, buf); }

+static int fimc_cap_expbuf(struct file *file, void *priv, + struct v4l2_exportbuffer *eb) +{ + struct fimc_dev *fimc = video_drvdata(file); + + return vb2_expbuf(&fimc->vid_cap.vbq, eb); +} + static int fimc_cap_dqbuf(struct file *file, void *priv, struct v4l2_buffer *buf) { @@ -1081,6 +1089,7 @@ static const struct v4l2_ioctl_ops fimc_capture_ioctl_ops = {

.vidioc_qbuf = fimc_cap_qbuf, .vidioc_dqbuf = fimc_cap_dqbuf, + .vidioc_expbuf = fimc_cap_expbuf,

.vidioc_streamon = fimc_cap_streamon, .vidioc_streamoff = fimc_cap_streamoff, @@ -1463,7 +1472,7 @@ int fimc_register_capture_device(struct fimc_dev *fimc, q = &fimc->vid_cap.vbq; memset(q, 0, sizeof(*q)); q->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; - q->io_modes = VB2_MMAP | VB2_USERPTR; + q->io_modes = VB2_MMAP | VB2_USERPTR | VB2_DMABUF; q->drv_priv = fimc->vid_cap.ctx; q->ops = &fimc_capture_qops; q->mem_ops = &vb2_dma_contig_memops;

-- 1.7.5.4

Tomasz Stanislawski

10:17 a.m.

New subject: [RFCv2 PATCH 9/9] v4l: s5p-tv: mixer: integrate with dmabuf

Signed-off-by: Tomasz Stanislawski t.stanislaws@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- drivers/media/video/s5p-tv/Kconfig | 1 + drivers/media/video/s5p-tv/mixer_video.c | 12 +++++++++++- 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/drivers/media/video/s5p-tv/Kconfig b/drivers/media/video/s5p-tv/Kconfig index f248b28..2e80126 100644 --- a/drivers/media/video/s5p-tv/Kconfig +++ b/drivers/media/video/s5p-tv/Kconfig @@ -10,6 +10,7 @@ config VIDEO_SAMSUNG_S5P_TV bool "Samsung TV driver for S5P platform (experimental)" depends on PLAT_S5P && PM_RUNTIME depends on EXPERIMENTAL + select DMA_SHARED_BUFFER default n ---help--- Say Y here to enable selecting the TV output devices for diff --git a/drivers/media/video/s5p-tv/mixer_video.c b/drivers/media/video/s5p-tv/mixer_video.c index f7ca5cc..f08edbf 100644 --- a/drivers/media/video/s5p-tv/mixer_video.c +++ b/drivers/media/video/s5p-tv/mixer_video.c @@ -697,6 +697,15 @@ static int mxr_dqbuf(struct file *file, void *priv, struct v4l2_buffer *p) return vb2_dqbuf(&layer->vb_queue, p, file->f_flags & O_NONBLOCK); }

+static int mxr_expbuf(struct file *file, void *priv, + struct v4l2_exportbuffer *eb) +{ + struct mxr_layer *layer = video_drvdata(file); + + mxr_dbg(layer->mdev, "%s:%d\n", __func__, __LINE__); + return vb2_expbuf(&layer->vb_queue, eb); +} + static int mxr_streamon(struct file *file, void *priv, enum v4l2_buf_type i) { struct mxr_layer *layer = video_drvdata(file); @@ -724,6 +733,7 @@ static const struct v4l2_ioctl_ops mxr_ioctl_ops = { .vidioc_querybuf = mxr_querybuf, .vidioc_qbuf = mxr_qbuf, .vidioc_dqbuf = mxr_dqbuf, + .vidioc_expbuf = mxr_expbuf, /* Streaming control */ .vidioc_streamon = mxr_streamon, .vidioc_streamoff = mxr_streamoff, @@ -1074,7 +1084,7 @@ struct mxr_layer *mxr_base_layer_create(struct mxr_device *mdev,

layer->vb_queue = (struct vb2_queue) { .type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, - .io_modes = VB2_MMAP | VB2_USERPTR, + .io_modes = VB2_MMAP | VB2_USERPTR | VB2_DMABUF, .drv_priv = layer, .buf_struct_size = sizeof(struct mxr_buffer), .ops = &mxr_video_qops,

-- 1.7.5.4

4844

days inactive

4859

days old

linaro-mm-sig@lists.linaro.org

43 comments

participants

tags (0)

participants (6)

Daniel Vetter
Daniel Vetter
Jerome Glisse
Laurent Pinchart
Subash Patel
Tomasz Stanislawski