Linaro-mm-sig June 2012

linaro-mm-sig@lists.linaro.org

33 participants
35 discussions

[PATCHv2 0/6] ARM: DMA-mapping: new extensions for buffer sharing

by Marek Szyprowski

Hello, This is an updated version of the patch series introducing a new features to DMA mapping subsystem to let drivers share the allocated buffers (preferably using recently introduced dma_buf framework) easy and efficient. The first extension is DMA_ATTR_NO_KERNEL_MAPPING attribute. It is intended for use with dma_{alloc, mmap, free}_attrs functions. It can be used to notify dma-mapping core that the driver will not use kernel mapping for the allocated buffer at all, so the core can skip creating it. This saves precious kernel virtual address space. Such buffer can be accessed from userspace, after calling dma_mmap_attrs() for it (a typical use case for multimedia buffers). The value returned by dma_alloc_attrs() with this attribute should be considered as a DMA cookie, which needs to be passed to dma_mmap_attrs() and dma_free_attrs() funtions. The second extension is required to let drivers to share the buffers allocated by DMA-mapping subsystem. Right now the driver gets a dma address of the allocated buffer and the kernel virtual mapping for it. If it wants to share it with other device (= map into its dma address space) it usually hacks around kernel virtual addresses to get pointers to pages or assumes that both devices share the DMA address space. Both solutions are just hacks for the special cases, which should be avoided in the final version of buffer sharing. To solve this issue in a generic way, a new call to DMA mapping has been introduced - dma_get_sgtable(). It allocates a scatter-list which describes the allocated buffer and lets the driver(s) to use it with other device(s) by calling dma_map_sg() on it. The third extension solves the performance issues which we observed with some advanced buffer sharing use cases, which require creating a dma mapping for the same memory buffer for more than one device. From the DMA-mapping perspective this requires to call one of the dma_map_{page,single,sg} function for the given memory buffer a few times, for each of the devices. Each dma_map_* call performs CPU cache synchronization, what might be a time consuming operation, especially when the buffers are large. We would like to avoid any useless and time consuming operations, so that was the main reason for introducing another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC, which lets dma-mapping core to skip CPU cache synchronization in certain cases. The proposed patches have been rebased on the latest Linux kernel v3.5-rc2 with 'ARM: replace custom consistent dma region with vmalloc' patches applied (for more information, please refer to the http://www.spinics.net/lists/arm-kernel/msg179202.html thread). The patches together with all dependences are also available on the following GIT branch: git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.5-rc2-dma-ext-v2 Best regards Marek Szyprowski Samsung Poland R&D Center Changelog: v2: - rebased onto v3.5-rc2 and adapted for CMA and dma-mapping changes - renamed dma_get_sgtable() to dma_get_sgtable_attrs() to match the convention of the other dma-mapping calls with attributes - added generic fallback function for dma_get_sgtable() for architectures with simple dma-mapping implementations v1: http://thread.gmane.org/gmane.linux.kernel.mm/78644 http://thread.gmane.org/gmane.linux.kernel.cross-arch/14435 (part 2) - initial version Patch summary: Marek Szyprowski (6): common: DMA-mapping: add DMA_ATTR_NO_KERNEL_MAPPING attribute ARM: dma-mapping: add support for DMA_ATTR_NO_KERNEL_MAPPING attribute common: dma-mapping: introduce dma_get_sgtable() function ARM: dma-mapping: add support for dma_get_sgtable() common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute Documentation/DMA-attributes.txt | 42 ++++++++++++++++++ arch/arm/common/dmabounce.c | 1 + arch/arm/include/asm/dma-mapping.h | 3 + arch/arm/mm/dma-mapping.c | 69 ++++++++++++++++++++++++------ drivers/base/dma-mapping.c | 18 ++++++++ include/asm-generic/dma-mapping-common.h | 18 ++++++++ include/linux/dma-attrs.h | 2 + include/linux/dma-mapping.h | 3 + 8 files changed, 142 insertions(+), 14 deletions(-) -- 1.7.1.569.g6f426

13 years, 1 month

[PATCH] ARM: dma-mapping: fix debug messages in dmabounce code

by Marek Szyprowski

This patch fixes the usage of uninitialized variables in dmabounce code intoduced by commit a227fb92 ('ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops'): arch/arm/common/dmabounce.c: In function ‘dmabounce_sync_for_device’: arch/arm/common/dmabounce.c:409: warning: ‘off’ may be used uninitialized in this function arch/arm/common/dmabounce.c:407: note: ‘off’ was declared here arch/arm/common/dmabounce.c: In function ‘dmabounce_sync_for_cpu’: arch/arm/common/dmabounce.c:369: warning: ‘off’ may be used uninitialized in this function arch/arm/common/dmabounce.c:367: note: ‘off’ was declared here Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com> --- arch/arm/common/dmabounce.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c index 9d7eb53..aa07f59 100644 --- a/arch/arm/common/dmabounce.c +++ b/arch/arm/common/dmabounce.c @@ -366,8 +366,8 @@ static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, struct safe_buffer *buf; unsigned long off; - dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n", - __func__, addr, off, sz, dir); + dev_dbg(dev, "%s(dma=%#x,sz=%zx,dir=%x)\n", + __func__, addr, sz, dir); buf = find_safe_buffer_dev(dev, addr, __func__); if (!buf) @@ -377,8 +377,8 @@ static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, BUG_ON(buf->direction != dir); - dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n", - __func__, buf->ptr, virt_to_dma(dev, buf->ptr), + dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x off=%#lx) mapped to %p (dma=%#x)\n", + __func__, buf->ptr, virt_to_dma(dev, buf->ptr), off, buf->safe, buf->safe_dma_addr); DO_STATS(dev->archdata.dmabounce->bounce_count++); @@ -406,8 +406,8 @@ static int __dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, struct safe_buffer *buf; unsigned long off; - dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n", - __func__, addr, off, sz, dir); + dev_dbg(dev, "%s(dma=%#x,sz=%zx,dir=%x)\n", + __func__, addr, sz, dir); buf = find_safe_buffer_dev(dev, addr, __func__); if (!buf) @@ -417,8 +417,8 @@ static int __dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, BUG_ON(buf->direction != dir); - dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n", - __func__, buf->ptr, virt_to_dma(dev, buf->ptr), + dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x off=%#lx) mapped to %p (dma=%#x)\n", + __func__, buf->ptr, virt_to_dma(dev, buf->ptr), off, buf->safe, buf->safe_dma_addr); DO_STATS(dev->archdata.dmabounce->bounce_count++); -- 1.7.1.569.g6f426

13 years, 1 month

[PATCH][RFC] mm: Don't put CMA pages on per-cpu lists

by Laura Abbott

Currently, when freeing 0 order pages, CMA pages are treated the same as regular movable pages, which means they end up on the per-cpu page list. This means that the CMA pages are likely to be allocated for something other than contigous memory. This increases the chance that the next alloc_contig_range will fail because pages can't be migrated. Given the size of the CMA region is typically limited, it is best to optimize for success of alloc_contig_range as much as possible. Do this by freeing CMA pages directly instead of putting them on the per-cpu page lists. Signed-off-by: Laura Abbott <lauraa(a)codeaurora.org> --- mm/page_alloc.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0e1c6f5..c9a6483 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1310,7 +1310,8 @@ void free_hot_cold_page(struct page *page, int cold) * excessively into the page allocator */ if (migratetype >= MIGRATE_PCPTYPES) { - if (unlikely(migratetype == MIGRATE_ISOLATE)) { + if (unlikely(migratetype == MIGRATE_ISOLATE) + || is_migrate_cma(migratetype)) { free_one_page(zone, page, 0, migratetype); goto out; } -- 1.7.8.3

13 years, 1 month

[PATCH 00/12] Support for dmabuf exporting for videobuf2

by Tomasz Stanislawski

Hello everyone, The patches adds support for DMABUF exporting to V4L2 stack. The latest support for DMABUF importing was posted in [1]. The exporter part is dependant on DMA mapping redesign [2] which is not merged into the mainline. Therefore it is posted as a separate patchset. Moreover some patches depends on vmap extension for DMABUF by Dave Airlie [3] and sg_alloc_table_from_pages function [4]. Changelog: v0: (RFC) - updated setup of VIDIOC_EXPBUF ioctl - doc updates - introduced workaround to avoid using dma_get_pages, - removed caching of exported dmabuf to avoid existence of circular reference between dmabuf and vb2_dc_buf or resource leakage - removed all 'change behaviour' patches - inital support for exporting in s5p-mfs driver - removal of vb2_mmap_pfn_range that is no longer used - use sg_alloc_table_from_pages instead of creating sglist in vb2_dc code - move attachment allocation to exporter's attach callback [1] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/48730 [2] http://thread.gmane.org/gmane.linux.kernel.cross-arch/14098 [3] http://permalink.gmane.org/gmane.comp.video.dri.devel/69302 [4] This patchset is rebased on 3.4-rc1 plus the following patchsets: Marek Szyprowski (1): v4l: vb2-dma-contig: let mmap method to use dma_mmap_coherent call Tomasz Stanislawski (11): v4l: add buffer exporting via dmabuf v4l: vb2: add buffer exporting via dmabuf v4l: vb2-dma-contig: add setup of sglist for MMAP buffers v4l: vb2-dma-contig: add support for DMABUF exporting v4l: vb2-dma-contig: add vmap/kmap for dmabuf exporting v4l: s5p-fimc: support for dmabuf exporting v4l: s5p-tv: mixer: support for dmabuf exporting v4l: s5p-mfc: support for dmabuf exporting v4l: vb2: remove vb2_mmap_pfn_range function v4l: vb2-dma-contig: use sg_alloc_table_from_pages function v4l: vb2-dma-contig: Move allocation of dbuf attachment to attach cb drivers/media/video/s5p-fimc/fimc-capture.c | 9 + drivers/media/video/s5p-mfc/s5p_mfc_dec.c | 13 ++ drivers/media/video/s5p-mfc/s5p_mfc_enc.c | 13 ++ drivers/media/video/s5p-tv/mixer_video.c | 10 + drivers/media/video/v4l2-compat-ioctl32.c | 1 + drivers/media/video/v4l2-dev.c | 1 + drivers/media/video/v4l2-ioctl.c | 6 + drivers/media/video/videobuf2-core.c | 67 ++++++ drivers/media/video/videobuf2-dma-contig.c | 323 ++++++++++++++++++++++----- drivers/media/video/videobuf2-memops.c | 40 ---- include/linux/videodev2.h | 26 +++ include/media/v4l2-ioctl.h | 2 + include/media/videobuf2-core.h | 2 + include/media/videobuf2-memops.h | 5 - 14 files changed, 411 insertions(+), 107 deletions(-) -- 1.7.9.5

13 years, 1 month

Re: [Linaro-mm-sig] [RFC] Synchronizing access to buffers shared with dma-buf between drivers/devices

by Erik Gilling

On Thu, Jun 7, 2012 at 4:35 AM, Tom Cooksey <tom.cooksey(a)arm.com> wrote: > The alternate is to not associate sync objects with buffers and > have them be distinct entities, exposed to userspace. This gives > userpsace more power and flexibility and might allow for use-cases > which an implicit synchronization mechanism can't satisfy - I'd > be curious to know any specifics here. Time and time again we've had problems with implicit synchronization resulting in bugs where different drivers play by slightly different implicit rules. We're convinced the best way to attack this problem is to move as much of the command and control of synchronization as possible into a single piece of code (the compositor in our case.) To facilitate this we're going to be mandating this explicit approach in the K release of Android. > However, every driver which > needs to participate in the synchronization mechanism will need > to have its interface with userspace modified to allow the sync > objects to be passed to the drivers. This seemed like a lot of > work to me, which is why I prefer the implicit approach. However > I don't actually know what work is needed and think it should be > explored. I.e. How much work is it to add explicit sync object > support to the DRM & v4l2 interfaces? > > E.g. I believe DRM/GEM's job dispatch API is "in-order" > in which case it might be easy to just add "wait for this fence" > and "signal this fence" ioctls. Seems like vmwgfx already has > something similar to this already? Could this work over having > to specify a list of sync objects to wait on and another list > of sync objects to signal for every operation (exec buf/page > flip)? What about for v4l2? If I understand you right a job submission with explicit sync would become 3 submission: 1) submit wait for pre-req fence job 2) submit render job 3) submit signal ready fence job Does DRM provide a way to ensure these 3 jobs are submitted atomically? I also expect GPU vendor would like to get clever about GPU to GPU fence dependancies. That could probably be handled entirely in the userspace GL driver. > I guess my other thought is that implicit vs explicit is not > mutually exclusive, though I'd guess there'd be interesting > deadlocks to have to debug if both were in use _at the same > time_. :-) I think this is an approach worth investigating. I'd like a way to either opt out of implicit sync or have a way to check if a dma-buf has an attached fence and detach it. Actually, that could work really well. Consider: * Each dma_buf has a single fence "slot" * on submission * the driver will extract the fence from the dma_buf and queue a wait on it. * the driver will replace that fence with it's own complettion fence before the job submission ioctl returns. * dma_buf will have two userspace ioctls: * DETACH: will return the fence as an FD to userspace and clear the fence slot in the dma_buf * ATTACH: takes a fence FD from userspace and attaches it to the dma_buf fence slot. Returns an error if the fence slot is non-empty. In the android case, we can do a detach after every submission and an attach right before. -Erik

13 years, 1 month

Re: [Linaro-mm-sig] Synchronization framework

by Maarten Lankhorst

Hey Erik, Op 07-06-12 19:35, Erik Gilling schreef: > On Thu, Jun 7, 2012 at 1:55 AM, Maarten Lankhorst > <m.b.lankhorst(a)gmail.com> wrote: >> I haven't looked at intel and amd, but from a quick glance >> it seems like they already implement fencing too, so just >> some way to synch up the fences on shared buffers seems >> like it could benefit all graphics drivers and the whole >> userspace synching could be done away with entirely. > It's important to have some level of userspace API so that GPU > generated graphics can participate in the graphics pipeline. Think of > the case where you have a software video codec streaming textures into > the GPU. It needs to know when the GPU is done with those textures so > it can reuse the buffer. > In the graphics case this problem already has to be handled without dma-buf, so adding any extra synchronization api for userspace that is only used when the bo is shared is a waste. I do agree you need some way to synch userspace though, but I think adding a new api for userspace is not the way to go. Cheers, Maarten PS: re-added cc's that seem to have fallen off from your mail.

13 years, 1 month

Re: [Linaro-mm-sig] [RFC] Synchronizing access to buffers shared with dma-buf between drivers/devices

by Erik Gilling

Tom, Is there more planned for KDS? It seems to be lacking some important features to be useful across many SoCs and graphics cards and features needed by Android. Here's some general feedback on those gaps. There is no way to share information between a buffer provider and a buffer consumer. This is important for architectures such as Tegra which have several hardware blocks that share common hardware synchronization. There's no userspace API. There are several reasons this is necessary. First, some userspace code (such as GL libs) might need to get at the private data of the sync primitive in order to generate command lists for a piece of hardware. Second is does not let userspace have control or even visibility into the graphics pipeline. The direction we are moving in Android is to put more control over synchronization into the compositor and move it out of being implemented "behind the scenes" by every vendor. Third, there's no way for a userspace process to wait on a sync primitive. There's no debugging or timing information tracked with the sync primitives. During development on new platforms and new OS versions we often have cases where the graphics pipeline stops making forward progress because one of the pieces (GPU, display, camera, dsp, userspace) has, itself, stopped making forward progress. Finding the root cause of the often hard to reproduce cases is difficult when you have to instrument every single driver. It's unclear how you would attach a dependency on a EGL fence to a dma_buf. Maybe this would be an EGL extension where you pass in the fence and the dma_buf. At Android we've been working on our own approach to this problem. I'll post those patches for discussion. Cheers, Erik

13 years, 1 month

[PATCHv6 00/13] Integration of videobuf2 with dmabuf

by Tomasz Stanislawski

Hello everyone, This patchset adds support for DMABUF [2] importing to V4L2 stack. The support for DMABUF exporting was moved to separate patchset due to dependency on patches for DMA mapping redesign by Marek Szyprowski [4]. v6: - fixed missing entry in v4l2_memory_names - fixed a bug occuring after get_user_pages failure - fixed a bug caused by using invalid vma for get_user_pages - prepare/finish no longer call dma_sync for dmabuf buffers v5: - removed change of importer/exporter behaviour - fixes vb2_dc_pages_to_sgt basing on Laurent's hints - changed pin/unpin words to lock/unlock in Doc v4: - rebased on mainline 3.4-rc2 - included missing importing support for s5p-fimc and s5p-tv - added patch for changing map/unmap for importers - fixes to Documentation part - coding style fixes - pairing {map/unmap}_dmabuf in vb2-core - fixing variable types and semantic of arguments in videobufb2-dma-contig.c v3: - rebased on mainline 3.4-rc1 - split 'code refactor' patch to multiple smaller patches - squashed fixes to Sumit's patches - patchset is no longer dependant on 'DMA mapping redesign' - separated path for handling IO and non-IO mappings - add documentation for DMABUF importing to V4L - removed all DMABUF exporter related code - removed usage of dma_get_pages extension v2: - extended VIDIOC_EXPBUF argument from integer memoffset to struct v4l2_exportbuffer - added patch that breaks DMABUF spec on (un)map_atachment callcacks but allows to work with existing implementation of DMABUF prime in DRM - all dma-contig code refactoring patches were squashed - bugfixes v1: List of changes since [1]. - support for DMA api extension dma_get_pages, the function is used to retrieve pages used to create DMA mapping. - small fixes/code cleanup to videobuf2 - added prepare and finish callbacks to vb2 allocators, it is used keep consistency between dma-cpu acess to the memory (by Marek Szyprowski) - support for exporting of DMABUF buffer in V4L2 and Videobuf2, originated from [3]. - support for dma-buf exporting in vb2-dma-contig allocator - support for DMABUF for s5p-tv and s5p-fimc (capture interface) drivers, originated from [3] - changed handling for userptr buffers (by Marek Szyprowski, Andrzej Pietrasiewicz) - let mmap method to use dma_mmap_writecombine call (by Marek Szyprowski) [1] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/4296… [2] https://lkml.org/lkml/2011/12/26/29 [3] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/3635… [4] http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819 Laurent Pinchart (2): v4l: vb2-dma-contig: Shorten vb2_dma_contig prefix to vb2_dc v4l: vb2-dma-contig: Reorder functions Marek Szyprowski (2): v4l: vb2: add prepare/finish callbacks to allocators v4l: vb2-dma-contig: add prepare/finish to dma-contig allocator Sumit Semwal (4): v4l: Add DMABUF as a memory type v4l: vb2: add support for shared buffer (dma_buf) v4l: vb: remove warnings about MEMORY_DMABUF v4l: vb2-dma-contig: add support for dma_buf importing Tomasz Stanislawski (5): Documentation: media: description of DMABUF importing in V4L2 v4l: vb2-dma-contig: Remove unneeded allocation context structure v4l: vb2-dma-contig: add support for scatterlist in userptr mode v4l: s5p-tv: mixer: support for dmabuf importing v4l: s5p-fimc: support for dmabuf importing Documentation/DocBook/media/v4l/compat.xml | 4 + Documentation/DocBook/media/v4l/io.xml | 179 +++++++ .../DocBook/media/v4l/vidioc-create-bufs.xml | 1 + Documentation/DocBook/media/v4l/vidioc-qbuf.xml | 15 + Documentation/DocBook/media/v4l/vidioc-reqbufs.xml | 45 +- drivers/media/video/s5p-fimc/Kconfig | 1 + drivers/media/video/s5p-fimc/fimc-capture.c | 2 +- drivers/media/video/s5p-tv/Kconfig | 1 + drivers/media/video/s5p-tv/mixer_video.c | 2 +- drivers/media/video/v4l2-ioctl.c | 1 + drivers/media/video/videobuf-core.c | 4 + drivers/media/video/videobuf2-core.c | 207 +++++++- drivers/media/video/videobuf2-dma-contig.c | 520 +++++++++++++++++--- include/linux/videodev2.h | 7 + include/media/videobuf2-core.h | 34 ++ 15 files changed, 924 insertions(+), 99 deletions(-) -- 1.7.9.5

13 years, 1 month

Re: [Linaro-mm-sig] Synchronization framework

by Maarten Lankhorst

Hey, For intel/nouveau hybrid graphics I'm interested in this since it would allow me to synchronize between intel and nvidia cards without waiting for rendering to complete. I'm worried about the api though, nouveau and intel already have existing infrastructure to deal with fencing so exposing additional ioctl's will complicate the implementation. Would it be possible to never expose this interface to userspace but keep it inside the kernel only? nouveau_gem_ioctl_pushbuf is what's used for nouveau. If any dmabuf synch framework could hook into that then userspace would never have to act differently on shared bo's. I haven't looked at intel and amd, but from a quick glance it seems like they already implement fencing too, so just some way to synch up the fences on shared buffers seems like it could benefit all graphics drivers and the whole userspace synching could be done away with entirely. Cheers, Maarten

13 years, 1 month

Re: [Linaro-mm-sig] [RFC] Synchronizing access to buffers shared with dma-buf between drivers/devices

by Erik Gilling

On Wed, Jun 6, 2012 at 6:33 AM, John Reitan <john.reitan(a)arm.com> wrote: >> But maybe instead of inventing something new, we can just use 'struct >> kthread_work' instead of 'struct kds_callback' plus the two 'void *'s? >> If the user needs some extra args they can embed 'struct >> kthread_work' in their own struct and use container_of() magic in the >> cb. >> >> Plus this is a natural fit if you want to dispatch callbacks instead >> on a kthread_worker, which seems like it would simplify a few things >> when it comes to deadlock avoidance.. ie., not resource deadlock >> avoidance, but dispatching callbacks when some lock is held. > > That sounds like a better approach. > Will make a cleaner API, will look into it. When Tom visited us for android graphics camp in the fall he argued that there were cases where we would want to avoid an extra schedule. Consider the case where the GPU is waiting for a render buffer that the display controller is using. If that render can be kicked off w/o acquiring locks, the display's vsync IRQ handler can call release, which in turn calls the GPU callback, which in turn kicks off the render very quickly w/o having to leave IRQ context. One way around the locking issue with callbacks/async wait is to have async wait return a value to indicate that the resource has been acquired instead of calling the callback. This is the approach I chose in our sync framework. -Erik

13 years, 1 month

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig June 2012