Hello,
This patch series is a continuation of my works on implementing generic
IOMMU support in DMA mapping framework for ARM architecture. Now I
focused on the DMA mapping framework itself. It turned out that adding
support for common dma_map_ops structure was not that hard as I initally
thought. After some modification most of the code fits really well to
the generic dma_map_ops methods.
The only change required to dma_map_ops is a new alloc function. During
the discussion on Linaro Memory Management meeting in Budapest we got
the idea that we can have only one alloc/free/mmap function with
additional attributes argument. This way all different kinds of
architecture specific buffer mappings can be hidden behind the
attributes without the need of creating several versions of dma_alloc_
function. I also noticed that the dma_alloc_noncoherent() function can
be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
will just ignore such attribute.
Another good use case for alloc methods with attributes is the
possibility to allocate buffer without a valid kernel mapping. There are
a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
buffers to user space. Such drivers don't touch the buffer data at all.
For such buffers we can avoid the creation of a mapping in kernel
virtual address space, saving precious vmalloc area. Such buffers might
be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
All the changes introduced in this patch series are intended to prepare
a good ground for upcoming generic IOMMU integration to DMA mapping
framework on ARM architecture.
For more information about proof-of-concept IOMMU implementation in DMA
mapping framework, please refer to my previous set of patches:
http://www.spinics.net/lists/linux-mm/msg19856.html
I've tried to split the redesign into a set of single-step changes for
easier review and understanding. If there is anything that needs further
clarification, please don't hesitate to ask.
The patches are prepared on top of Linux Kernel v3.0-rc3.
The proposed changes have been tested on Samsung Exynos4 platform. I've
also tested dmabounce code (by manually registering support for DMA
bounce for some of the devices available on my board), although my
hardware have no such strict requirements. Would be great if one could
test my patches on different ARM architectures to check if I didn't
break anything.
Best regards
--
Marek Szyprowski
Samsung Poland R&D Center
Patch summary:
Marek Szyprowski (8):
ARM: dma-mapping: remove offset parameter to prepare for generic
dma_ops
ARM: dma-mapping: implement dma_map_single on top of dma_map_page
ARM: dma-mapping: use asm-generic/dma-mapping-common.h
ARM: dma-mapping: implement dma sg methods on top of generic dma ops
ARM: dma-mapping: move all dma bounce code to separate dma ops
structure
ARM: dma-mapping: remove redundant code and cleanup
common: dma-mapping: change alloc/free_coherent method to more
generic alloc/free_attrs
ARM: dma-mapping: use alloc, mmap, free from dma_ops
arch/arm/Kconfig | 1 +
arch/arm/common/dmabounce.c | 112 +++--
arch/arm/include/asm/device.h | 1 +
arch/arm/include/asm/dma-mapping.h | 835 +++++++++++++-----------------------
arch/arm/mm/dma-mapping.c | 278 +++++++------
include/linux/dma-attrs.h | 1 +
include/linux/dma-mapping.h | 13 +-
7 files changed, 539 insertions(+), 702 deletions(-)
rewrite arch/arm/include/asm/dma-mapping.h (66%)
--
1.7.1.569.g6f426
Hello everyone,
Like I've promised during the Memory Management summit at Linaro Meeting
in Budapest I continued the development of the CMA. The goal is to
integrate it as tight as possible with other kernel subsystems (like
memory management and dma-mapping) and finally merge to mainline.
This version introduces integration with DMA-mapping subsystem for ARM
architecture, but I believe that similar integration can be done for
other archs too. I've also rebased all the code onto latest v3.0-rc2
kernel.
A few words for these who see CMA for the first time:
The Contiguous Memory Allocator (CMA) makes it possible for device
drivers to allocate big contiguous chunks of memory after the system
has booted.
The main difference from the similar frameworks is the fact that CMA
allows to transparently reuse memory region reserved for the big
chunk allocation as a system memory, so no memory is wasted when no
big chunk is allocated. Once the alloc request is issued, the
framework will migrate system pages to create a required big chunk of
physically contiguous memory.
For more information see the changelog and links to previous versions
of CMA framework.
The current version of CMA is just an allocator that handles allocation
of contiguous memory blocks. The difference between this patchset and
Kamezawa's alloc_contig_pages() are:
1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
which may be unsuitable for embeded systems where a few MiBs are
required.
Lack of the requirement on the alignment means that several threads
might try to access the same pageblock/page. To prevent this from
happening CMA uses a mutex so that only one cm_alloc()/cm_free()
function may run at one point.
2. CMA may use its own migratetype (MIGRATE_CMA) which behaves
similarly to ZONE_MOVABLE but can be put in arbitrary places.
This is required for us since we need to define two disjoint memory
ranges inside system RAM. (ie. in two memory banks (do not confuse
with nodes)).
3. alloc_contig_pages() scans memory in search for range that could be
migrated. CMA on the other hand maintains its own allocator to
decide where to allocate memory for device drivers and then tries
to migrate pages from that part if needed. This is not strictly
required but I somehow feel it might be faster.
The integration with ARM DMA-mapping subsystem is quite straightforward.
Once cma context is available alloc_pages() can be replaced by
cm_alloc() call.
Current version have been tested on Samsung S5PC110 based Aquila machine
and s5p-fimc V4L2 driver. The driver itself uses videobuf2 dma-contig
memory allocator, which in turn relies on dma_alloc_coherent() from
DMA-mapping subsystem. By integrating CMA with DMA-mapping we manage to
get this driver working with CMA without any single change required in
the driver or videobuf2-dma-contig allocator.
TODO:
1. use struct page * or pfn internally instead of physicall address
2. use some simple bitmap based allocator instead of genaloc
3. provide a function similar to dma_declare_coherent_memory(), which
will created and register cma area for particular device
4. code cleanup and simplification
5. discussion
6. double-mapping issues with ARMv6+ and coherent memory
Best regards
--
Marek Szyprowski
Samsung Poland R&D Center
Links to previous versions of the patchset:
v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787>
v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
v5: (intentionally left out as CMA v5 was identical to CMA v4)
lv4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>
Changelog:
v10:
1. Rebased onto 3.0-rc2 and resolved all conflicts
2. Simplified CMA to be just a pure memory allocator, for use
with platfrom/bus specific subsystems, like dma-mapping.
Removed all device specific functions are calls.
3. Integrated with ARM DMA-mapping subsystem.
4. Code cleanup here and there.
5. Removed private context support.
v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts
2. Fixed a bunch of nasty bugs that happened when the allocation
failed (mainly kernel oops due to NULL ptr dereference).
3. Introduced testing code: cma-regions compatibility layer and
videobuf2-cma memory allocator module.
v8: 1. The alloc_contig_range() function has now been separated from
CMA and put in page_allocator.c. This function tries to
migrate all LRU pages in specified range and then allocate the
range using alloc_contig_freed_pages().
2. Support for MIGRATE_CMA has been separated from the CMA code.
I have not tested if CMA works with ZONE_MOVABLE but I see no
reasons why it shouldn't.
3. I have added a @private argument when creating CMA contexts so
that one can reserve memory and not share it with the rest of
the system. This way, CMA acts only as allocation algorithm.
v7: 1. A lot of functionality that handled driver->allocator_context
mapping has been removed from the patchset. This is not to say
that this code is not needed, it's just not worth posting
everything in one patchset.
Currently, CMA is "just" an allocator. It uses it's own
migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
which behave just like ZONE_MOVABLE but dispite the latter can
be put in arbitrary places.
2. The migration code that was introduced in the previous version
actually started working.
v6: 1. Most importantly, v6 introduces support for memory migration.
The implementation is not yet complete though.
Migration support means that when CMA is not using memory
reserved for it, page allocator can allocate pages from it.
When CMA wants to use the memory, the pages have to be moved
and/or evicted as to make room for CMA.
To make it possible it must be guaranteed that only movable and
reclaimable pages are allocated in CMA controlled regions.
This is done by introducing a MIGRATE_CMA migrate type that
guarantees exactly that.
Some of the migration code is "borrowed" from Kamezawa
Hiroyuki's alloc_contig_pages() implementation. The main
difference is that thanks to MIGRATE_CMA migrate type CMA
assumes that memory controlled by CMA are is always movable or
reclaimable so that it makes allocation decisions regardless of
the whether some pages are actually allocated and migrates them
if needed.
The most interesting patches from the patchset that implement
the functionality are:
09/13: mm: alloc_contig_free_pages() added
10/13: mm: MIGRATE_CMA migration type added
11/13: mm: MIGRATE_CMA isolation functions added
12/13: mm: cma: Migration support added [wip]
Currently, kernel panics in some situations which I am trying
to investigate.
2. cma_pin() and cma_unpin() functions has been added (after
a conversation with Johan Mossberg). The idea is that whenever
hardware does not use the memory (no transaction is on) the
chunk can be moved around. This would allow defragmentation to
be implemented if desired. No defragmentation algorithm is
provided at this time.
3. Sysfs support has been replaced with debugfs. I always felt
unsure about the sysfs interface and when Greg KH pointed it
out I finally got to rewrite it to debugfs.
v5: (intentionally left out as CMA v5 was identical to CMA v4)
v4: 1. The "asterisk" flag has been removed in favour of requiring
that platform will provide a "*=<regions>" rule in the map
attribute.
2. The terminology has been changed slightly renaming "kind" to
"type" of memory. In the previous revisions, the documentation
indicated that device drivers define memory kinds and now,
v3: 1. The command line parameters have been removed (and moved to
a separate patch, the fourth one). As a consequence, the
cma_set_defaults() function has been changed -- it no longer
accepts a string with list of regions but an array of regions.
2. The "asterisk" attribute has been removed. Now, each region
has an "asterisk" flag which lets one specify whether this
region should by considered "asterisk" region.
3. SysFS support has been moved to a separate patch (the third one
in the series) and now also includes list of regions.
v2: 1. The "cma_map" command line have been removed. In exchange,
a SysFS entry has been created under kernel/mm/contiguous.
The intended way of specifying the attributes is
a cma_set_defaults() function called by platform initialisation
code. "regions" attribute (the string specified by "cma"
command line parameter) can be overwritten with command line
parameter; the other attributes can be changed during run-time
using the SysFS entries.
2. The behaviour of the "map" attribute has been modified
slightly. Currently, if no rule matches given device it is
assigned regions specified by the "asterisk" attribute. It is
by default built from the region names given in "regions"
attribute.
3. Devices can register private regions as well as regions that
can be shared but are not reserved using standard CMA
mechanisms. A private region has no name and can be accessed
only by devices that have the pointer to it.
4. The way allocators are registered has changed. Currently,
a cma_allocator_register() function is used for that purpose.
Moreover, allocators are attached to regions the first time
memory is registered from the region or when allocator is
registered which means that allocators can be dynamic modules
that are loaded after the kernel booted (of course, it won't be
possible to allocate a chunk of memory from a region if
allocator is not loaded).
5. Index of new functions:
+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions, size_t size,
+ dma_addr_t alignment)
+static inline int
+cma_info_about(struct cma_info *info, const const char *regions)
+int __must_check cma_region_register(struct cma_region *reg);
+dma_addr_t __must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);
+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions,
+ size_t size, dma_addr_t alignment);
+int cma_allocator_register(struct cma_allocator *alloc);
Patches in this patchset:
lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()
lib: genalloc: Generic allocator improvements
Some improvements to genalloc API (most importantly possibility to
allocate memory with alignment requirement).
mm: move some functions from memory_hotplug.c to page_isolation.c
mm: alloc_contig_freed_pages() added
Code "stolen" from Kamezawa. The first patch just moves code
around and the second provide function for "allocates" already
freed memory.
mm: alloc_contig_range() added
This is what Kamezawa asked: a function that tries to migrate all
pages from given range and then use alloc_contig_freed_pages()
(defined by the previous commit) to allocate those pages.
mm: MIGRATE_CMA migration type added
mm: MIGRATE_CMA isolation functions added
Introduction of the new migratetype and support for it in CMA.
MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
memory range can be marked as one.
mm: cma: Contiguous Memory Allocator added
The code CMA code. Manages CMA contexts and performs memory
allocations.
ARM: integrate CMA with dma-mapping subsystem
Main client of CMA frame work. CMA serves as a alloc_pages()
replacement if device has the cma context assigned.
ARM: S5PV210: add CMA support for FIMC devices on Aquila board
Example of platform/board specific code that creates cma
context and assigns it to particular devices.
Patch summary:
KAMEZAWA Hiroyuki (2):
mm: move some functions from memory_hotplug.c to page_isolation.c
mm: alloc_contig_freed_pages() added
Marek Szyprowski (3):
mm: cma: Contiguous Memory Allocator added
ARM: integrate CMA with dma-mapping subsystem
ARM: S5PV210: add CMA support for FIMC devices on Aquila board
Michal Nazarewicz (5):
lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()
lib: genalloc: Generic allocator improvements
mm: alloc_contig_range() added
mm: MIGRATE_CMA migration type added
mm: MIGRATE_CMA isolation functions added
arch/arm/include/asm/device.h | 3 +
arch/arm/include/asm/dma-mapping.h | 19 ++
arch/arm/mach-s5pv210/Kconfig | 1 +
arch/arm/mach-s5pv210/mach-aquila.c | 26 +++
arch/arm/mm/dma-mapping.c | 60 +++++--
include/linux/bitmap.h | 24 ++-
include/linux/cma.h | 189 ++++++++++++++++++
include/linux/genalloc.h | 50 +++---
include/linux/mmzone.h | 43 ++++-
include/linux/page-isolation.h | 50 ++++--
lib/bitmap.c | 22 ++-
lib/genalloc.c | 190 +++++++++++--------
mm/Kconfig | 29 +++-
mm/Makefile | 1 +
mm/cma.c | 358 +++++++++++++++++++++++++++++++++++
mm/compaction.c | 10 +
mm/internal.h | 3 +
mm/memory_hotplug.c | 111 -----------
mm/page_alloc.c | 292 ++++++++++++++++++++++++++---
mm/page_isolation.c | 130 ++++++++++++-
20 files changed, 1319 insertions(+), 292 deletions(-)
create mode 100644 include/linux/cma.h
create mode 100644 mm/cma.c
--
1.7.1.569.g6f426
Hello,
Folloing the discussion about the driver for IOMMU controller for
Samsung Exynos4 platform and Arnd's suggestions I've decided to start
working on redesign of dma-mapping implementation for ARM architecture.
The goal is to add support for IOMMU in the way preffered by the
community :)
Some of the ideas about merging dma-mapping api and iommu api comes from
the following threads:
http://www.spinics.net/lists/linux-media/msg31453.htmlhttp://www.spinics.net/lists/arm-kernel/msg122552.htmlhttp://www.spinics.net/lists/arm-kernel/msg124416.html
They were also discussed on Linaro memory management meeting at UDS
(Budapest 9-12 May).
I've finaly managed to clean up a bit my works and present the initial,
very proof-of-concept version of patches that were ready just before
Linaro meeting.
What have been implemented:
1. Introduced arm_dma_ops
dma_map_ops from include/linux/dma-mapping.h suffers from the following
limitations:
- lack of start address for sync operations
- lack of write-combine methods
- lack of mmap to user-space methods
- lack of map_single method
For the initial version I've decided to use custom arm_dma_ops.
Extending common interface will take time, until that I wanted to have
something already working.
dma_{alloc,free,mmap}_{coherent,writecombine} have been consolidated
into dma_{alloc,free,mmap}_attrib what have been suggested on Linaro
meeting. New attribute for WRITE_COMBINE memory have been introduced.
2. moved all inline ARM dma-mapping related operations to
arch/arm/mm/dma-mapping.c and put them as methods in generic arm_dma_ops
structure. The dma-mapping.c code deinitely needs cleanup, but this is
just a first step.
3. Added very initial IOMMU support. Right now it is limited only to
dma_alloc_attrib, dma_free_attrib and dma_mmap_attrib. It have been
tested with s5p-fimc driver on Samsung Exynos4 platform.
4. Adapted Samsung Exynos4 IOMUU driver to make use of the introduced
iommu_dma proposal.
This patch series contains only patches for common dma-mapping part.
There is also a patch that adds driver for Samsung IOMMU controller on
Exynos4 platform. All required patches are available on:
git://git.infradead.org/users/kmpark/linux-2.6-samsung dma-mapping branch
Git web interface:
http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads…
Future:
1. Add all missing operations for IOMMU mappings (map_single/page/sg,
sync_*)
2. Move sync_* operations into separate function for better code sharing
between iommu and non-iommu dma-mapping code
3. Splitting out dma bounce code from non-bounce into separate set of
dma methods. Right now dma-bounce code is compiled conditionally and
spread over arch/arm/mm/dma-mapping.c and arch/arm/common/dmabounce.c.
4. Merging dma_map_single with dma_map_page. I haven't investigated
deeply why they have separate implementation on ARM. If this is a
requirement then dma_map_ops need to be extended with another method.
5. Fix dma_alloc to unmap from linear mapping.
6. Convert IO address space management code from gen-alloc to some
simpler bitmap based solution.
7. resolve issues that might araise during discussion & comments
Please note that this is very early version of patches, definitely NOT
intended for merging. I just wanted to make sure that the direction is
right and share the code with others that might want to cooperate on
dma-mapping improvements.
Best regards
--
Marek Szyprowski
Samsung Poland R&D Center
Patch summary:
Marek Szyprowski (2):
ARM: Move dma related inlines into arm_dma_ops methods
ARM: initial proof-of-concept IOMMU mapper for DMA-mapping
arch/arm/Kconfig | 1 +
arch/arm/include/asm/device.h | 3 +
arch/arm/include/asm/dma-iommu.h | 30 ++
arch/arm/include/asm/dma-mapping.h | 653 +++++++++++------------------
arch/arm/mm/dma-mapping.c | 817 +++++++++++++++++++++++++++++++++---
arch/arm/mm/vmregion.h | 2 +-
include/linux/dma-attrs.h | 1 +
7 files changed, 1033 insertions(+), 474 deletions(-)
create mode 100644 arch/arm/include/asm/dma-iommu.h
--
1.7.1.569.g6f426
Hi all,
As Linaro refines its release cycles and related processes, we would
very much like to ensure that work stays on track and that we're able
to make progress. To that end, we'd like to have a sync-up meeting on
IRC (#linaro-mm-sig on irc.linaro.org or irc.freenode.net) to cover
status and next steps on the topics we've discussed in the summit, on
this list, and others. To address the timezone issue (there are a lot
of them between us), I'd like to offer to usurp the normal meeting
slot for the graphics working group, which, if nothing else, should
ensure that at least the working group members that are assigned to
memory management topics will be there; specifically, 1200UTC on
Wednesday, June 22. The meeting details are here:
https://wiki.linaro.org/OfficeofCTO/MemoryManagement/Notes/2011-06-22
I've put in a preliminary agenda, and the minutes and actions will be
culled from the channel log after the meeting. Please let me know if
you can't make it or if you want to see other items on the agenda but
can't edit the wiki for some reason (I'm not clear on the write access
to that page, but I'm happy to make proxy edits).
cheers,
Jesse
Hi,
I have a below use case for the UMM.
Samsung EXYNOS4 SoC has hardware IP for JPEG encoding/decoding. The
Android gallery application uses the JPEG decoder to draw the
images/thumbnail. As of now, skia library which handles this, uses a
software JPEG decoder for the same.
Buffer will be allocated by the Skia library for the JPEG image file to
be decoded. Now if we want to use hardware IP for decoding, we need to
a) Change the buffer allocation mechanism in Skia to get the buffers
from JPEG driver (mmapped)
b) Pass the user allocated buffer into the JPEG driver, and then to the
IP through the proposed DMA-IOMMU framework.
We feel (b) is nicer way to handle, with minimal changes into the
Android framework. But the issue lies if the UMM is going to address
this scenario.
Please mail me back if you need any clarifications on the requirement.
Regards,
Subash
Hello Ilias,
I would prefer to have a fortnightly meeting at an preferred time of
14:00 UTC (to suit IN and further east TZ). Also, conference calls are
more preferred.
Regards,
Subash
Samsung India - Linaro,
Bangalore - India.
Launchpad: https://launchpad.net/~subashp/
On 02/06/11 01:58, Jesse Barker wrote:
> > * Communication and Meetings
> > - New IRC channel #linaro-mm-sig for meetings and general
> > communication between those working on and interested in these topics
> > (already created).
> > - IRC meetings will be weekly with an option for the consituency to
> > decide on ultimate frequency (logs to be emailed to linaro-mm-sig
> > list).
> > - Linaro can provide wiki services and any dial-in needed.
> > - Next face-to-face meetings:
> > . Linaro mid-cycle summit (August 1-5, see
> > https://wiki.linaro.org/Events/2011-08-LDS)
> > . Linux Plumbers Conference (September 7-9, see
> > http://www.linuxplumbersconf.org/2011/ocw/proposals/567)
> > . V4L2 brainstorm meeting (Hans Verkuil to update with details)
> >
Since this is an area of key interest to many parties, a periodic
meeting could provide a channel to all who are interested to participate
and discuss. I can set it up and send Google calendar invitations.
One obvious issue with this idea would be: With participants in this
list from 5-6 timezones having 1 meeting time would be challenging, but
perhaps a time slot around UTC16:00 would suit most?
Means: IRC as Jesse mentioned above, we can also setup a call, via
Canonical's conferencing system.
Frequency: Is there a specific need for discussing weekly? Assuming
once-every-fortnight frequency, there could be around 2-3 meetings
before the Linaro sprint in August 1-5. Of course if there is
participation on the IRC channel, then communication can happen more
often...
There are a couple more items I wanted to ask about:
1. I think we need a single wiki page with all the relevant pointers and
consolidated info, at wiki.linaro.org. I can collect the information
pointers available and setup the wiki page.
2. Tracking work progress: certainly this work has been planned via
Launchpad blueprints added by Jesse. I do not know if everyone is on
Launchpad - I'd like to ask for suggestions on how to track work
progress especially from those who are not using Launchpad. Would
progress updates via the wiki suffice? If you have other suggestions
please let me know.
BR,
-- Ilias Biris, Aallonkohina 2D 19, 02320 Espoo, Finland Tel: +358 50
4839608 (mobile) Email: ilias dot biris at linaro dot org Skype:
ilias_biris
Memory Management Mini-Summit
Linaro Developer Summit, Budapest, May 9-11, 2011
=================================================
Hi all. Apologies for this report being so long in coming. I know
others have thrown in their perceptions and opinions on how the
mini-summit went, so I suppose it's my turn.
Outcomes:
---------
* Approach (full proposal under draft, to be sent to the lists below)
- Modified CMA for additional physically contiguous buffer support.
- dma-mapping API changes, enhancements and ARM architecture support.
- "struct dma_buf" based buffer sharing infrastructure with support
from device drivers.
- Pick any "low-hanging fruit" with respect to consolidation
(supporting the ARM arch/sub-arch goals).
* Proposal for work around allocation, mapping and buffer sharing to
be announced on:
- dri-devel
- linux-arm-kernel
- linux-kernel
- linux-media
- linux-mm
- linux-mm-sig
* Communication and Meetings
- New IRC channel #linaro-mm-sig for meetings and general
communication between those working on and interested in these topics
(already created).
- IRC meetings will be weekly with an option for the consituency to
decide on ultimate frequency (logs to be emailed to linaro-mm-sig
list).
- Linaro can provide wiki services and any dial-in needed.
- Next face-to-face meetings:
. Linaro mid-cycle summit (August 1-5, see
https://wiki.linaro.org/Events/2011-08-LDS)
. Linux Plumbers Conference (September 7-9, see
http://www.linuxplumbersconf.org/2011/ocw/proposals/567)
. V4L2 brainstorm meeting (Hans Verkuil to update with details)
Overview and Goals for the 3 days:
----------------------------------
* Day 1 - Component overviews, expected to spill over into day 2
* Day 2 - Concrete use case that outlines a definition of the problem
that we are trying to solve, and shows that we have solved it.
* Day 3 - Dig into the lower level details of the current
implementations. What do we have, what's missing, what's not
implemented for ARM.
This is about memory management, zero-copy pipelines, kernel/userspace
interfaces, memory management, memory reservations and much more :-)
In particular, what we would like to end up with is:
* Understand who is working on what; avoid work duplication.
* Focus on a specific problem we want to solve and discuss possible solutions.
* Come up with a plan to fix this specific problem.
* Start enumerating work items that the Linaro Graphics WG can work
on in this cycle.
Day 1:
------
The first day got off to a little bit of a stutter start as the summit
scheduler would not let us indicate that our desired starting time was
immediately after lunch, during the plenaries. However, that didn't
stop people from flocking to the session in droves. By the time I
made the kickoff comments on why we were there, and what we were there
to accomplish (see "Overview and Goals for the 3 days" above), we had
brought in an extra 10 chairs and there were people on the floor and
spilling out into the hallway.
Based upon our experiences from the birds-of-a-feather at the Embedded
Linux Conference, 2 things dominated day 1. First things first, I
assigned someone to take notes ;-). Etherpad made it really easy for
people to take notes collectively, including those participating
remotely, and for everyone to see who was writing what, but we
definitely needed someone whose focus would be capturing the
proceedings, so thanks to Dave Rusling for shouldering that burden.
The second thing was that we desperately needed an education in each
others components and subsystems. Without this, we would risk missing
significant areas of discussion, or possibly even be violently
agreeing on something without realizing it. So, we started with a
series of component overviews. These were presentations on the order
of 20 minutes with some room for Q&A. On day 1, we had:
* V4L2 - Hans Verkuil
* DRM/GEM/KMS - Daniel Vetter
* TTM - Thomas Hellstrom
* CMA - Marek Szyprowski
* VCMM - Zach Pfeffer
All of these (as well as the ones from day 2) are available through
links on the mini-summit wiki
(https://wiki.linaro.org/Events/2011-05-MM).
Day 2:
------
The second day got off to a bit better a start than did day 1 as we
more clearly communicated the start time to everyone involved, and
forgot about the summit scheduler. We (conceptually) picked up where
day 1 left off with one more component overview:
* UMP - Ketil Johnson
and, covered the MediaController API for good measure. From there, we
spent a fair amount of time discussing use cases to illustrate our
problem space. We started (via pre-summit submissions) with a couple
of variations on what amounted to basically the same thing. I think
the actual case is probably best illustrated by the pdf slides from
Sakari Ailus (see the link on the mini-summit wiki). Basically, we
want to take a video input, either from a camera or from a file,
decode it, process it, render to it and/or with it and display it.
These pipeline stages may be handled by hardware, by software on the
CPU or some combination of the two; each stage should be handled by
accepting a buffer from the last stage and operating on it in some
fashion (no copies wherever possible). It turned out that still image
capture can actually be a more complicated version of this use case,
but even something as simple as taking input through the camera and
displaying it (image preview) can involve much of the underpinnings
required to support the more complicated cases. We may indeed start
with this simple case as a proof-of-concept.
Once we had the use case nailed down, we moved onto the actual
components/subsystems that would need to share buffers in order for
the use case to work properly with the zero-copy (or at least
minimal-copy) requirement. We had:
* DRM
* V4L2
* fbdev
* ALSA
* DSP
* User-space (kind of all encompassing and could include things like
OpenCL, which also makes an interesting use case).
* DVB
* Out-of-tree GPU drivers
We wound out the day by discussing exactly what metadata we would want
to track in order to enable the desired levels of sharing with
simultaneous device mappings, cache management and other
considerations (e.g., device peculiarities). What we came up with is
a struct (we called it "dma_buf") that has the following info:
* Size
* Creator/Allocator
* Attributes:
- sharable?
- contiguous?
- device-local?
* Reference count
* Pinning reference count
* CPU cache management data
* Device private data (e.g., quirky tiling modes)
* Scatter list
* Synchronization data (for managing in-flight device transactions)
* Mapping data
These last few (device privates through mapping data) are lists of
data, one for each device that has a mapping of the buffer. The
mapping data is nominally an address and per-device cache management
data. We actually got through the this part fairly quickly. The
biggest part of the discussion was what to use for handles/identifiers
in the buffer sharing scheme. The discussion was between global
identifiers like GEM uses, or file descriptors as favored by Android.
Initially, there was an informal consensus around unique IDs, though
it was not a definitive decision (yet). The atomicity of passing file
descriptors between processes makes them quite attractive for the
task.
Day 3:
------
By the third day, there was a sense of running out of time and really
needing to ensure that we left with a reasonable set of outcomes (see
the overview and goals section above). In short, we wanted to make
sure that we had a plan/roadmap, a reasonably actionable set of tasks
that could be picked up by Linaro engineers and community members
alike, and that we would not only avoid duplicating new work, but also
reduce some of the existing code duplication that got us to this point
in the first place.
But, we weren't done. We still had to cover the requirements around
allocation and explore the dma-mapping and IOMMU APIs.
This took most of the day, but was a quite fruitful set of
discussions. As with the rest of the discussions, we focused on
leveraging existing technologies as much as possible. With
allocations, however, this wasn't entirely possible as we have devices
on ARM SoCs that do not have an IOMMU and require physically
contiguous buffers in order to operate. After a fair amount of
discussion, it was decided that a modified version of the current CMA
(see Marek's slides linked from the wiki). It assumes the pages are
movable and manages them and not the mappings. There was concern that
the API didn't quite fit with other related API, so the changes from
the current state will be around those details.
On the mapping side, we focused on the dma-mapping API with
appropriate layering on the IOMMU API where appropriate. Without
going into crazy detail, we are looking at something like 4
implementation s of the dma_map_ops functions for ARM: with and
without IOMMU, with and without bounce buffer (these last two exist,
but not using the dma_map_ops API). Marek has put out patches for
comment on the IOMMU based implementation of this based upon work he
had in progress. Also in the area of dma_map_ops, the sync related
API need a start address and offset, and the alloc and free need
attribute parameters like map and unmap already have (to support
cacheable/coherent/write-combined). In the "not involving
dma_map_ops" category, we have a couple of changes that are likely to
be non-trivial (not that any of the other proposed work is). It was
proposed to modify (actually, the word thrown about in the discussions
was "fix") dma_alloc_coherent for ARM to support unmapping from the
kernel linear mapping and the use of HIGHMEM; two separate
implementations, configured at build-time. And, last but not least,
there was a fair amount of concern over the cache management API and
its ability to live cleanly with the IOMMU code and to resist breakage
from other architecture implementations.
At this point, we reviewed what we had done and finalized the outcomes
(see the outcomes section at the top). And, with a half an hour to
spare, I re-instigated the file descriptors versus unique identifiers
discussion from day 2. I think file descriptors were winning by the
end (especially after people started posting pointers to code samples
of how to actually pass them between processes)....
Attendees:
----------
I will likely miss people here trying to list out everyone, especially
given that some of the sessions were quite literally overflowing the
room we were in. For as accurate an account of attendance as I can
muster, check out the list of attendees on the mini-summit wiki page
or the discussion blueprints we used for scheduling:
https://wiki.linaro.org/Events/2011-05-MM#Attendeeshttps://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-m…https://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-m…https://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-m…
The occupants of the fishbowl (the front/center of the room in closest
proximity to the microphones) were primarily:
Arnd Bergmann
Laurent Pinchart
Hans Verkuil
Mauro Chehab
Daniel Vetter
Sakari Ailus
Thomas Hellstrom
Marek Szyprowski
Jesse Barker
The IRC fishbowl seemed to consist of:
Rob Morell
Jordan Crouse
David Brown
There were certainly others both local and remote participating to
varying degrees that I do not intend to omit, and a special thanks
goes out to Joey Stanford for arranging a larger room for us on days 2
and 3 when we had people sitting on the floor and spilling into the
hallway during day 1.
On Mon, May 30, 2011 at 12:30 PM, PRASANNA KUMAR
<prasanna_tsm_kumar(a)yahoo.co.in> wrote:
> USB graphics devices from displaylink does not have 3D hardware. To get 3D
> effects (compiz, GNOME 3, KWin, OpenGL apps etc) with these device in Linux
> the native (primary) GPU can be used to provide hardware acceleration. All
> the graphics operation is done using the native (primary) GPU and the end
> result is taken and send to the displaylink device. Can this be achieved? If
> so is it possible to implement a generic framework so that any device (USB,
> thunderbolt or any new technology) can use this just by implementing device
> specific (compression and) data transport? I am not sure this is the correct
> mailing list.
fwiw, this situation is not too far different from the SoC world. For
example, there are multiple ARM SoC's that share the same IMG/PowerVR
core or ARM/mali 3d core, but each have their own unique display
controller..
I don't know quite the best way to deal with this (either at the
DRM/kernel layer or xorg driver layer), but there would certainly be
some benefit to be able to make DRM driver a bit more modular to
combine a SoC specific display driver (mostly the KMS part) with a
different 2d and/or 3d accelerator IP. Of course the (or some of the)
challenge here is that different display controllers might have
different memory mgmt requirements (for ex, depending on whether the
display controller has an IOMMU or not) and formats, and that the flip
command should somehow come via the 2d/3d command stream.
I have an (experimental) DRM/KMS driver for OMAP which tries to solve
the issue by way of a simple plugin API, ie the idea being to separate
the PVR part from the OMAP display controller part more cleanly. I
don't think it is perfect, but it is an attempt. (I'll send patches
as an RFC, but wanted to do some cleanup first.. just haven't had time
yet.) But I'm definitely open to suggestions here.
BR,
-R
> Thanks,
> Prasanna Kumar
> _______________________________________________
> dri-devel mailing list
> dri-devel(a)lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>